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Description 



[0001 J The present invention relates to the detection of nucleic acid sequences in two or more collections of nucleic 



[0002] The relationship between structure and function of macromolecules is of fundamental importance in the un- 
l^^f m : TheS r e,ati ° nShiPS ^ imP ° rtanl t0 UnderS,andin * for exampKe fu "c 1 f 

^s^r^£;^:;zzr* in which cel,s co ™ wiih each — • as - - 

[0003] Genefc information is critical in continuation of life processes. Life is substantially informational based and 
tfs genet.c content controls the growth and reproduction of the organism and its eompJLTSSSKtS 

Zl Z ~ r S ■ SVStemS ' enC ° ded by ,he 9ene,ic ma,erial of lhe ce " »" P^ZTope^ 

enzymes fur.ct.onal I prote.ns. and structural proteins are determined by the sequence of amino acids whic ^ make hem 
up. As structure and funct,on are integrally related, many biological functions may be explained T2c7^1 the 
Berlin" 9 ! T ^ *°» F °' ^ * »« 

de.erm.ne the genet.c sequences of nucleotides which encode the enzymes, structural proteins and othe^r effecto s 
of b.o.og.cal functions. In addition to segments of nucleotides which encode polypeptides there are many ^nucleotide 
sequences wh.ch are involved in control and regulation of gene expression V nUC,e °" de 

[0004] The human genome project is directed toward determining the complete sequence the genome of the human 

sionS firm 9 , f T™* W ° Uld ^ COrreSP ° nd ,0 ,he S6( ' Uence of a "V indiv?duaXXZe 
s.gn.ficantm ormajon.as to the general organization and specific sequences contained within segmente from partTcu ar 

netdt^ W ° "5 a,S ° Pr ° Vide maPPi " 9 inf ° rma,i0n WhiCh is Ve ^ useful for ^ bailed SSS^ZSrS, 
eque^^ S 
~a?e73^ 

7*U,9™ZrTr*Z C c y T,? ,0daV f ° r SeqUendn9 inC ' Ude ,he San 9 er dideox V ™.hod, *». e.g.. Sanger 
H98Q MP,h nH p , USA > 74:5463 " 5467 . « «» Maxam and Gilbert method, see, e.g Maxam e. al 

1980) Methods mEnzymoloqy , 65:499-559. The Sanger method utilizes enzymatic elongation p ocedures w"h cha n 

generate nucleot.de specfic cleavages. Both methods require a practitioner to perform a large number o collex 
manual man.pulat.ons. These manipu.ations usua.ly require isolating homogeneous DNA ZZZTlt^T, 
ed.ous prepanng of samples, preparing a separating gel. app.ying sam P ,es 1 the ge,, eJ^^^X 
nto th.s gel, workmg up the finished gel. and analyzing the results of the procedure P 

ESS -21 ^ST° n Pr ° VideS 3 m6,h0d f ° r acid Sec — «wo or more co-.ec.ions of 

(a) providing an array comprising more than 100 different polynucleotide probes bound to a solid surface; 

(b) contacting said array of probes under hybridisation conditions with: 

(i) a first collection of nucleic acids comprised of first-labelled nucleic acids having at least some sequences 
complementary to probes of said array, and sequences 

2m! 'f Sl 3 SeC ° nd C ?" eC,i0n ° f nUdeic acids com P rised ° f second-labelled nucleic acids having at least 
some sequences complementary to probes of said array. 

wherein said first and second labels are distinguishable from each other; and 

(c) detecting hybridisation of first and second labelled complementary nucleic acids to probes of said array. 

lengths- Preferred emb ° dimemS S3id ^ and SGCOnd ,abe,s are ""orescent labels that emit light of different wave- 
[0008] The method of the invention may be used to fingerprint at least first and second cells wherein said first 
col.ect.on of nucle.c acids is from a first cell and said second collection of nucleic acids is from a second ceU and 
f and S6C0nd ,abe ' S h « ed <° «» is detected, optiona,, the method oHhe invent 



(a) determining levels of gene expression in said first and second cells 

(b) determining patterns of gene expression in said first and second cells 

(c) determining genetic differences between said first and second cells 



EP 0 834 576 B1 



[0009] The first and second cells may be different types-of cells, optionally wherein: 

(a) at least one cell type is a tumour cell or other cell exhibiting abnormal physiology 

(b) said first and second cells are at different stages of development 

(c) said first and second cells are at different stages of infection or other disease or 

[0010] |„ other embodiments at least one collection of nucleic acids may be synthesized by fluorescent.y labelling: 

(a) RNA isolated, generated or amplified from said cell; or 

(b) DNA isolated, generated or amplified from said cell.' 

[0011] The solid surface is preferably a polymeric substrate or includes fibers 

[0012] Polynucleotide probes may be bound to the solid surface at a density of at least 103 Dreferab | v af 1fl4 
more pre erabjy at leas, 10* even more preferably a, least 10* regions per cm" to t^ZX^JXSS' 
Tnl certain embodiments the solid surface may be formed as a collection of beads and each Je^TZ- 

UnT« <r ? " kTh ° t ? Sm9,e b6ad - SUCh embodime "«s ^eh bead may further comprise an encodbg system 
bound thereto such that the sequence of the polynucleotide bound to a bead can be determined byTeSg t 
encoding system, optionally wherein said encoding system is selected from the group consisting of a , magneStem 

ss ssstzr- coiour encoding sys,em - ° r — *• — ^i:zt SdTo 

[ ° 014 !k Th ,n 5 array ° f p0,ynucleo,ide P robes ma V c of"Pnse more than 103, pre ferably more than 10* more preferablv 
more han 10S even more preferably more than 10* different probes bound to the so,id surface ' ' 

Sois: • r g ; sreater ,han about 15 - preferabiy greater -* ^ — «■« 

[0016] In certain preferred embodiments at least said two collections of nucleic acids are hybridised to the same 
[0017] The array may be recycled for use. 

[001 8] The sequences of polynucleotide probes of the array may be known 

required and automating most of the steps, the speed, accuracy, and reliability of these procedures are greatly en- 
S ro The P , r ° dUCti T f asubstr ate- having a matrix of positional^ defined regions with attached reagents exhibiting 

[0021] The automation of the substrate production method and of the scan and analysis steps minimizes the need 
for human .ntervention. This simplifies the tasks and promotes reproducibility. "iimmizes need 

[0022] The method of the invention employs a composition comprising a plurality of positional^ distinguishable se 

ZnZn ' I 6396 " 15 aUaChed ,0 3 SO ' id SUbStrate ' Which are «P*i o1 specifea y 22^2^ 

termmed subunit sequence of a preselected multi-subunit length having a, leas, three subunits s Z ^reagents reore 

iTZ^T T" ^ P ° SSib ', e S6qUenCeS ° f PreSeleC,ed len ^ h s °™ -"bocl^tt^iS^SS 
is a polynucleotide sequence. In other embodiments, the specific reagent is an oligonucleotide of at leas, abou, five 

Z T T 'k ' 31 !eaSt *** nudeb,ides - ™ re P^erably a, least 12 nucleofides. UmJ^Sl^Z 

embodSsT 3 S0 "' d SUbStra,e ' and ^ rea9en,S C ° mpriSe 3t least 3000 differen sequ^ncL Tn lt 

embodiments, the reagents represents at least about 25% of the possible subsequences of said preselected length 
Usually, the reagents are localized in regions of the substrate having a density of at least 25 region P e Z JrlTn 
no™ 3 , T theS " bS ' ra,e h3S 3 Surface area <* '-s than abou, 4 square centimeters. By way 7e^eZ 

T^nT' IT''"" : e,h ° dS ° f ,he inVenti0 " may be USed for persona ' Wentification, genetic screen'glden 
fication of patho.og.ca, conditions, determination of patterns of specific gene expression, and others 

^ AM^iJ^S^-^ T lar9el S6qUenCe W0U ' d ,yPiCa " y be thr ° U 9 h a fluorescent label - the 
InTl'i. I ? . t f,UOreSCen, ,abel ,s probably mos « convenient, other sorts of labels, e.g.. radioactive enzyme linked 
optically detectable or spectroscopic labels may be used. Because the oligonucleotide probes are posi ionaldef ned 
the ocation o, the hybridized duplex can directly translate to the sequences which hybridize. Thus ZyJbTX 

*r ,7 r eCti0n ° f subse < uences the target sequence. These subseque fs may be 

matched with respect to the.r overlaps so as to assemble an intact target sequence. 
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[0024] Preferred embodiments of the invention will now be described by may of examples and with reference to 
drawings in which: 

[0025] Fig. 1 illustrates a flow chart for sequence, fingerprint, or mapping analysis. 
[0026] Fig. 2 illustrates the proper function of a VLSIPS nucleotide synthesis. 
[0027] Fig. 3 illustrates the proper function of a VLSIPS dinucleotide synthesis. 
[0028] Fig. 4 illustrates the process of a VLSIPS trinucleotide synthesis. 

I. Overall Description 

A. general 

B. VLSIPS substrates 

C. binary masking 

D. applications 

E. detection methods and apparatus 

F. data analysis 

II. Theoretical Analysis 

A. simple n-mer structure; theory 

B. complications 

III. Polynucleotide Sequencing 

A. preparation of substrate matrix 

B. labeling target polynucleotide 

C. hybridization conditions 

D. detection; VLSIPS scanning 

E. analysis 

F. substrate reuse 

IV. Fingerprinting 

A. general 

B. preparation of substrate matrix 

C. labeling target nucleotides 

D. hybridization conditions 

E. detection; VLSIPS scanning 

F. analysis 

G. substrate reuse 

H. other polynucleotide aspects 

V. Mapping 

A. general 

B. preparation of substrate matrix 

C. labeling 

D. hybridization/specific interaction 

E. detection 

F. analysis 

G. substrate reuse 

VI. Additional Screening 

A. specific interactions 

B. sequence comparisons 

C. categorizations 

D. statistical correlations 
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VII. Formation of Substrate 

A. instrumentation 

B. binary masking 

C. synthetic methods 

D. surface immobilization 



VIIL Hybridization/Specific Interaction 

A. general 

B. important parameters 



IX. Detection Methods 

A. labeling techniques 

B. scanning system 

X. Data Analysis 



A. general 

B. hardware 

C. software 

XI. Substrate Reuse 



A. removal of label 

B. storage and preservation 

C. processes to avoid degradation of oligomers 

XII. Integrated Sequencing Strategy 

A. initial mapping strategy 

B. selection of smaller clones 



XIII. Commercial Applications 

A. sequencing 

B. fingerprinting 

C. mapping 

I. OVERALL DESCRIPTION 
A. General 

[0029] The present invention relies in part on the ability to synthesize or attach specific recognition reagents at known 
locations on a substrate, typically a single substrate. In particular, the present invention provides the ability to prepare 
a substrate having a very high density matrix pattern of positionally defined specific recognition reagents. The reagents 
are capable of interacting with their specific targets while attached to the substrate, e.g., solid phase interactions, and 
by appropriate labeling of these targets, the sites of the interactions between the target and the specific reagents may 
be derived. Because the reagents are positionally defined, the sites of the interactions will define the specificity of each 
interaction. As a result/a map of the patterns of interactions with specific reagents on the substrate is convertible into 
information on the specific interactions taking place, e.g., the recognized features. Where the specific reagents recog- 
nize a large number of possible features, this system allows the determination of the combination of specific interactions 
which exist on the target molecule. Where the number of features is sufficiently large, the identical same combination, 
or pattern, of features is sufficiently unlikely that a particular target molecule may often be uniquely defined by its 
features. In the extreme, the features may actually be the subunit sequence of the target molecule, and a given target 
sequence may be uniquely defined by its combination of features. 

[0030] The methodology is applicable to sequencing polynucleotides. The specific sequence recognition reagents 
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will typically be oligonucleotide probes which hybridize with specificity to subsequences found on the target sequence 
A sufficjently large number of those probes allows the fingerprinting of a target polynuc.eo.ide or he 2teZZ^ 
of a collection of target polynucleotides, as described in greater detail below me relative mapping 

[0031] In the high resolution fingerprinting provided by a saturating collection of probes which include all oossible 

and" th f 6 ' 6 9 ' 1 °" merS ' CO " at,n9 ° f a " and determination of specifi o'v e r P s 
will be derived and the entire sequence can usually be reconstructed 

L°h 032 h ] , S T en , Ce analy f ™y take the form <>' complete sequence determination, to the level of the sequence of 
individual subum.s along the entire length of the targe, sequence. Sequence analyst also may take theTm of st 
quence homo ogy. e.g.. less than absolute subuni, resolution, where "similarity" in the sequence wn L de ec^abte or 
mnS" 1 f S l eCt,Ve SeqUenCeS ° f h ° m0,09y inters P ersed ^ specific or irregular locations. ' 
[0033] In either case, the sequence is determinable at selective resolution or at particular locations Thus the hv 
bntorton metho will be useful as a means for identification, e.g.. a "fingerprint", much like atSL^£L 
method .s used. It is also useful to map particular target sequences. nyoriaizat.on 

B. VLSIPS Substrates 

[0034] The invention is enabled by the development of technology to prepare substrates on which specific reagents 

7^°"?" P T°T at t aChed ° f Synth6Si2ed - PartiCUlar " the Ver * la '9e scale tamobHlJ^^s^SES 
(VLSIPS) technology allows for the very high density production of an enormous diversity of reagents mapped out ^ 

bindTe^r" T m " 3 SUb r S,rate - ^ rea9entS Sf>M > "b-qu^^CK^ 
bind thereto, producing a map of posi.iona.ly defined regions of interaction. These map positions are conveS fnto 
actual features recognized, and thus would be present in the target molecule of interest c °" v *™'e into 

SS? A rl S H ,nd,Ca,ed \ ,he seauence s P ecific recognition reagents will often be oligonucleotides which hybridize with 
fidelity and discrimination to the target sequence. 'yurraize wim 

[0036] In the generic sense, the VLSIPS technology allows the production of a substrate with a high density matrix 
of positional* mapped regions with specific recognition reagents at.ached a, each distinction Byus of IS 
groups which can be posi.iona.ly removed, or added, the regions can be activated or deactivated fo ^add" ion of par 
n ular reagents or compounds. Details of the protection are described below and in PCT publication no WO90/1 507^ 
publ.shed December 13, 1990. .n a preferred embodiment, photosensitive protecting agen.s wi.l be used and he re 

TcLl 3 I" " deaC !i Va,i ° n bG COn,r0 " ed by e,ec, "-optica. and optica, methods, simifc o many of the 
processes used in semiconductor wafer and chip fabrication 

EL !■" n , UCl ( e ^ acid ' nucleo ""de sequencing application, a VLSIPS substrate is synthesized having positionally 
def ned oligonucleotide probes. See PCT publication no. WO90/15070, published December 1 3 1990 By use o mask 
-ng technology and photosensitive synthetic subunits. the VLSIPS apparatus allows for the stepwle syntheses oTpol 

rZd a r rd,n9 n a p r itional, r defined ma,rix pa,,ern Each ° ,ig ° nuc ,e ° tide -» °* % 

defined positional locations on the substrate. This forms a matrix pattern of known relationship between position and 
spec.f.city of interaction The VLSIPS techno.ogy allows ,he production of a very large number of dSZfSTuS 
oUde probes to be simultaneously and automatically syn.hesized inc.uding numbers in excess of abou.T^ T 0 3 1 

11 m T' 31 denSitiSS ° f 31 ,eaS ' ab ° Ut 1 ° 2 ' 10W ' 10W - 1 °W and up .o 10W or more' 

Ids and P rl° n , ? Syn,heSiZin9 P °' ymerS 00 3 SHiC0n ° r 0ther suitab| V aer -Ld substrate me Z 

anH 711 ^ SyntheS,Z,n9 spec,fic of bio ^'oai polymers on those substrates, apparatus for scanning 

^tT^lT:?7? ,0 y aS T"^ 31 SP6CifiC IOpa,i ° nS °" ,he SUbs,rate ' and va "o- other .echnoTog ies 
related to the use of a high density very large scale immobilized polymer substrate. In particular sequencing, finoer 

^^<^sr ns are discussed herein in w ■»■* — 'elologies^e^edTo 

ml 3 hi a T , he r i e9i0ns 7 hich define particular rea 9ents will usually be generated by selective protecting groups which 

ZIT^ISST- TyPi ?V he Pr °' eC,in9 9r ° UP be b ° Und 10 3 m ° nomer or BPatXeg on" 

n.il h P ^ y a " aC " Vat0r> SUCh 35 e,ec,ro magnetic radiation. Examples of protective groups with 

^^^^^ ° XyCarb0nyl (NV0C) ' 0XyCarb0 " y or WdimeThot 

C. Binary Masking 

[0039] There are various particular ways to optimize the synthetic processes 

[0040] Briefly, the binary synthesis strategy refers to an ordered strategy for parallel synthesis of diverse polymer 
0X77 h- S h eqUentia ' addi,bn ° f rea 3 e " ,s -y be represented by a rea'c.ant maL, and s^^TZ 

S," s iroTa ZlToul T* A reaC,an, f ma,riX " 3 1 X " matriX ° f the buildin 9 blocks <° be added. The swi h 
matnx ,s all or a subset of the b.nary numbers from 1 to n arranged in columns. In preferred embodiments a binary 
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strategy is one in which at least two successive steps illuminate half of a region of interest on the substr^ m mt) 

mmmmm 



nucleotide sequence probes of a given length. 
P. Applications 



^sSs~^s^ « 
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III. 



IV. 



TABLE I 

VLSIPS PROJECT IN NUCLEIC ACIDS 



Construction of Chips 



Applications 



C. 



Sequencing 



1. 

2. 
3. 
4. 



Primary sequencing 

Secondary sequencing (sequence checking) 

Large scale mapping 

Fingerprinting 



Duplex/Triplex formation 



Antisense 



Sequence specific function modulation (e.g. promoter inhibition) 



Diagnosis 

1. Genetic markers 

2. Type markers 

a. 
b. 

Microbiology 
1. 
2. 



Blood donors 
Tissue transplants 



Clinical microbiology 
Food microbiology * 



Instrumentation 



A. 
B. 



Chip machines 
Detection 



Software Development 



A. 
B. 
C. 



Instrumentation software 
Data reduction software 
Sequence analysis software 



ITJL fi " 9erpnnt,n , 9 f na f ,s mav be used t0 P erf °™ various types of genetic screening. For example, a single 
substrate may be generated w.th a plurality of screening probes, allowing for the simultaneous genetic screening for 

mol L" Um r ^"t' m3rkerS - ThUS ' Prenata ' ° r dia9n0S,iC SCreenin 9 can be sim P |ified . economized, and made 
more generally accessible. 

[0048] In addition to the sequencing, fingerprinting, and mapping applications, the present invention also provides 
means for determining specificity of interaction with particular sequences. 

E. Detection Methods and Apparatus 

[0049] An appropriate detection method applicable to the selected labeling method can be selected Suitable labels 
include rad.onucleotides, enzymes, substrates, cefaclors, inhibitors, magnetic particles, heavy metal atoms, and par- 
ticularly fluorescers chemiluminescers, and spectroscopic labels. Patents teaching the use of such labels include U 
S. Patent Nos. 3.817,837; 3.850.752; 3.939.350; 3.996.345; 4,277.437; 4.275.149 and 4 366 241 
[0050] With an appropriate label selected, the detection system best adapted for high resolution and high sensitivity 
detection may be selected. As indicated above, an optically detectable system, e.g.. fluorescence or chemilumines- 
cence would be preferred. Other detection systems may be adapted to the purpose, e.g.. electron microscopy, scanning 
electron microscopy (SEM), scanning tunneling electron microscopy (STEM), infrared microscopy, atomic force micro- 
scopy (AFM). electrical condutance, and image plate transfer. 
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[0051] With a detection method selected, an apparatus for scanning the substrate will be designed. Apparatus as 
described .n PCT publication no. WO90/15070. published December 13,1 990. is particularly appropriate. Design mod- 

ifications may also be incorporated therein. . 

F. Data Analysis 

[0052] Data is analyzed by processes similar to those described below in the section describing theoretical analysis. 
More efficient algorithms will be mathematically devised, and will usually be designed to be performed on a computer 

ToZl^l^Z:** ^ qUiCklV " efnCienUy make meaSUrement SamP ' eS a " d **"** Signal 
[0053] The initial data resulting from the detection system is an array of data indicative of fluorescent intensity versus 
location thesubstrate^ /The data are typically taken over regions substantially smaller than the area in which synthesis 
of a given polymer has taken place. Merely by way of example, if polymers were synthesized in squares on the substrate 
having dimensions of 500 microns by 500 microns, thedata may be taken overregions having dimensions of 5 microns 
by 5 microns. In most preferred embodiments, the regions over which florescence data are taken across the substrate 
are less than about 1/2 the area of the regions in which individual polymers are synthesized, preferably less than 1/10 
the area .n which a single polymer is synthesized, and most preferably less than 1/100 the area in which a single 
polymer is synthesized. Hence, within any area in which a given polymer has been synthesized, a large number of 
fluorescence data points are collected. a Ul 

[0054] A plot of number of pixels versus intensity for a scan should bear a rough resemblance to a bell curve but 
spunous data are observed, particularly at higher intensities. Since it is desirable to use an average of fluorescent 
skewTeTta 3 ^ SyntheSi$ r69i ° n de,ermining re,ative bindin 9 affinit * ,hese s P"»°us data will tend to undesirably 
[0055] Accordingly, in one embodiment of the invention the data are corrected for removal of these spurious data 
points, and an average of the data points is thereafter utilized in determining relative binding efficiency. In general the 
data are fitted to a base curve and statistically measures are used to remove spurious data 

[0056] In an additional analytical tool, various degeneracy reducing analogues may be incorporated in the hybridi- 
WO O 9 n 0/04652. ^ * ^ *"* deSCribed ' e ' 9 " MaC6ViC2 ' S ' (1 " 0) PCT pUblicalion number 

II. THEORETICAL ANALYSIS 

[0057] The principle of the hybridization sequencing procedure is based, in part, upon the ability to determine overlaps 
of short segments. The VLSIPS technology provides the ability to generate reagents which will saturate the possible 
short subsequence recognition possibilities. The principle is most easily illustrated by using a binary sequence such 
as a sequence of zeros and ones. Once having illustrated the application to a binary alphabet, the principle may easily 
be understood to encompass three letter, four letter, five or more letter, even 20 letter alphabets. A theoretical treatment 
mq™ n't, °' Subse j quence '"^mation to reconstruction of a target sequence is provided, e.e., in Lysov, Yu.. et al 
i 0 3 ! NaUk - SSR 303:15 08-1511: Khropko K.. et al. (1989) FEBS Letters 256:118-122- Pevzner 

rnn«i L Biomolecular Structure and Dynamics 7:63-69; and Drmanac. R. et al. (1989) Genomics 4 114-128 ' 
[0058] The reagents for recognizing the subsequences will usually be specific for recognizing a particular polymer 

d^ztr K n r her t w !!r!; a,ar9et polymer - w is preferab,e ,hat conditi ° ns ™* be °»™ 

discriminate between high fidelity matching and very low levels of mismatching. The reagent interaction will preferably 
exhibit no sensitivity to flanking sequences, to the subsequence position within the target, or to any other remote 
structure within the sequence. 

A. Simple n-mer Structure: Theory 

1. Simple two letter alphabet: example 

[0059] A simple example is presented below of how a sequence of ten digits comprising zeros and ones would be 
sequenceable using short segments of five digits. For example, consider the sample ten digit sequence- 

1010011100. A VLSIPS substrate could be constructed, as discussed elsewhere, which would have reagents 
attached ,n a defined matrix pattern which specifically recognize each of the possible five digit sequences of ones and 
zeros. The number of possible five digit subsequences is 2* = 32. The number of possible different sequences 10 digits 
Vol A ] / IT C ° nti9UOUS di 9 il ^sequences within a ten digit sequence number six, i.e.. positioned at 

m V f.K , ■ , ' °- " Wi " be n ° ted tha ' ,he specific order of ,ne di 9" s in ,he se 9"ence is important 

and that the order ,s directional, e.g., running left to right versus right to left. The first five digit sequence contained in 
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SS5SlM. UenCe " 1010 °' SeC °" d 15 ° 1001 ' *" thiFd fe 10 ° 11 ' ,he foUrth is ° 0111 ' the «« h is and the 

«fZ 0 JJ he y^ S ' PS SU K bStrate W0U ' d h3Ve 3 maWx Pa,,em 0f P^itionally attached reagents which recognize each 
tarcet andTlh Th Those rea 9 e " 1s recognize each of the 6 contained 5-mers Z bfnd'he 

targe and a label allows the positional determination of where the sequence specific interaction has occuS Bv 

MS 1 Tn ^ a e b P o ° v S e iti0n ,h 7 atriX Pattem ' thC bound subsequences can be determined " 

ESI be above-ment,oned sequence, six different 5-mer sequences would be determined to be present. They 



10100 
01001 
10011 
00111 
OHIO 
11100 



[0062] Any sequence which contains the first five digit sequence, 10100, already narrows the number of possible 

r0 e 0 q 63 e i nCe Th s 6 ^ 1024 P OSSi 'If S r enCeS) Wh ' Ch C0 " ,ain » to ,ess *» abou? 192 pLbi^^ 
[0063] Th.s 92 ,s der.ved from the observation that with the subsequence 10100 at the far left of the sequence in 

4 P0 8 S ?9 S a nd a ^rrr ,y 32 P ° S , Sib,e SeqUenCeS - UkeWiSe - *» ,hat ^sequence in Z "^2 6 3-7 

4-8 5-9. and 6-10. So, to sum up all of the sequences that could contain 10100, there are 32 for each position and 6 

cou IT, 3 TH °l ab ° Ut P ° SSib,e SeqUenCeS " H0WeVer " SOme ° f *ese 10 ctigit seq ences wi iZe been 
counted tw.ce. Thus, by v,rtue of containing the 10100 subsequence, the number of possible 10-mer sequences has 
been decreased from 1 024 sequences to less than about 1 92 sequences ° e ! u mer sequences has 

[0064] | m this example, no. only do we know that sequence contains 10100, but we also know that it contains the 
ufripf 1 f' ve character sequence, 01 001 . By virtue of knowing that the sequence contains 10100, weca^look speclfkiaHy 
to determme whether the sequence contains a subsequence of five characters which contains the ^rTeftnSdtate 
plus a next d,g.« to the left. For example, we would look for a sequence of X101 0, but we find Shore ^ noT T hu 
we know tha he 10100 must be at the left end of , he 10-mer. We wou.d also .ook to see whether 

iVoTaTdlh^xIsa 3 ^ 

101001 ' " kn ° W * ' eaSt ° Ur ,ar96t SeqUe " Ce h3S an 0V6rta P ° f 0100 and "« «» .eft 

n°f wh- A , PP i yin - 9 th ,t Same Pr ° CedUre l ° ,he SeC0nd ^ mer ' we also know ,hat *e sequence must include a sequence 
tha Z tve 72 TslT™ T Y ^ Y te eHhw 0w1We '° ok ««e f-gmenl and Te 2 

TetreTthVr^ tar96t ' *" Y iS a,S ° 1 - Th - We k ™ that our sequence has a 

EoL ^°h in f 9 <0 ^ We kn ° W th3t there mUSt be 3 sequence of 0011z - where z be either 0 or 1 

Tt us we know Z" ^ »* ^ taf9el S6qUenCe C ° ntains a 00111 subsequence and Z s 

1 . nus, we know the sequence must start with 10100111 

[0067] The next 5-mer must be of the sequence 0111 W where W must be 0 or 1 . Again, looking up at the fraaments 
produced we see that the target sequence contains a 01110 subsequence, and W is a 0 Thus ouJsequeniTth s 
a P : ; S 01 ° 10 h V e know tbat *• 5-mer must be either 11100 or 11101. Looking ^w££X?X 1 00 
WOeT However , ^ ^ We haW detera ** d *« ™ -quence'mus, have bee 101 m 00 

[0068] However. « w.ll be recogmzed from the example above with the sequences provided therein that the sequence 
analysis can start w,th any ^known positive probe subsequence. The determination may be performed by ^ 
along the sequence checking the known sequence with a limited number of next positions. Given Sssibi 2 the' 
sequence may be determined, besides by scanning all possible o.igonuc.eotide probe positions by specifical y ook no 
only where the next possible positions wou.d be. This may increase the complexity of the M^S^Zl 

2£ SPan XT ,OWardS SCannin9 and deteCtin9 SpedfiC P0Si,i °" S ° f inlerest * ot^sequen" 

poss.b.ht es Thus, the scanning apparatus could be set up to work its way along a sequence from a given contained 

ST Si T** : ! ,0 ° k 31 ,h ° Se P0Si,i ° nS ° n lhe SUbS,rate Which are to ha* a positive 9 7^ 

[0069] It ,s seen .ha. g,ven a sequence, i. can be de-constructed into n-mers to produce a set of internal contiquous 

hvbri e d >T eS - g 'T ,ar9et S6qUenCe ' We W ° U,d be ab,e «° d e«ermine what fragmen would esui? The 

hybndrzafon sequence method depends, in part, upon being able to work in the reverse, from a se. of fragments of 
known sequences ,o .he fu„ sequence. In simple cases, one is ab.e to s.ar, at a single position anc ^ work S o 
both directions towards the ends of the sequence as illustrated in the example 

[0070] The number of possible sequences of a given .ength increases very quickly with .he length of that sequence 
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Thus, a 10-mer of zeros and ones has 1024 possibilities a 12-mpr 40qk a on u 

d«ln,T™l 7 ,, k quence ""^ ° ver 8 """ ion possible te SToi S 

more preferably a. least about 30% would be desired. Higher percentages would be especia.ly preferred 
2. Example of four letter alphabet 

[0071] A four letter alphabet may be conceptualized in at least two different ways from the two letter alphabet One 
way ,s to cons,der the four possible va.ues at each position and to analogize in a similar fa^JSTbSJ^S 
each of the overlaps. A second way is to group the binary digits into groups P 

!»t 7 r 2 I.nh U r? ™ fifSt meanS ' ° Ver,aP COm P arisons are Performed with a four letter alphabet rather than a two 
letter alphabet. Then. ,n contrast to the binary system with 10 positions where 2« = 1024 possible sequences *a 
4-cha ract e r a.phabetwi,h10 P ositions, there will actually be^M^STepo^^ST^TSS^ 



GGC 
. GCT 
CTA 



ELI e of To'^d" 1 ?n U t r hisr C,er t VerSe ; SVS,em be '° 0ked 3t P a,>S ° f di 9«- The pairs 
wouiac-euu. 01, 10, and 11. In this manner, the earlier used sequence 1010011100 is looked atasioinm 1- i nn 

] t a r oro C b haraCt T;; ,W ° d, ' 9i,S " Se,6C,ed fr ° m P ° SSib,e UniV6rSe <* ,he ^ P^S^ 00, ofo and 
imlr , P W ° i 6 ' n a " eVe " " Umber ° f di9itS ' e9 - not five di 9« s - 'hree pairs of digits or six did its A 
simi.ar comparison ,s performed and the possible overlaps determined. The 3-pair subsequences are 9 



10,10, 01 

10,01,11 

01, 11, oo 



and the overlap reconstruction produces 10,10,01,11,00 

wh°J, 4] ]Tu I? ^ C0nce P tual views of , "e 4 letter alphabet provides a representation which is similar to 

eg oT^A^ 

rht °, " ' G ' and 11 ,0 T And ' ,n fact - if such a correspondence is used, both examples for the 4 

character sequences can be seen to represent the same target sequence. The app.icabi.ity o the hybrSL'emod 

TJseTlZTJVotr^Z* m ™* S r e r iS eaSHy Me " if A is ,he -presentation of adenine. cTs he 
representation of cytosme. G is the representafon of guanine, and T is the representation of thymine or uracil. 

B. Complications 

[0075] Two obvious complications exist with the method of sequence analysis by hybridization. The firs, results from 
n " e °l ' na PP r °P ria,e ,e "9th while the second relates to internally repeated sequences 

which causes problems with the specificity of recognition. For example, if the recognized sequence ,s too short evTv 
sequence which ,s utiiized wi„ be recognized by every probe sequence. This occurs, e.g., a bfn r sy stem when! 
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the :probes are each of sequences- which occur relatively frequently, e.g., a two character probe for the binary system. 
Each poss.ble two character probe would be expected to appear K of the time in every single two character position 
lH* e V ! S T?™!t eXamP ' e W ° U,d *" rec °9 nized b V each of the 00, 10, 01, and 11. Thus, the sequence 

i IT' k" ? V T y reS °' U,i0n iS t0 ° ' 0W and each rec °9 nition rea 9 ent specifically binds at multiple 

sites on the target sequence. . 

S , .TT A °! . d J fferenl Pr ° beS WhiCh bi " d ,0 3 ,arget depends on the relationship between the probe length 

^ tTf m ' tk , eXt : eme ° f Sh ° rt Pr ° be len9th ' ,he JUSt mentioned P rob,em exis,s of excessive redundancy 
and lack of resoluhon. The lack of stability in recognition will also be a problem with extremely short probes At the 

217 ?k 9 Pr °^ ? h ' 6aCh e " ,ire Pr ° be S6qUenCe iS ° n a different P° si,i0 " of a s^rate. However, a problem 
!!!! r r nUmb6r °k POS$ible sequences ' which 9°es up dramafically with the length of the sequence. Also the 
specitoty o recognition begins to decrease as the contribution to binding by any particular subunit may become suf- 
ficiently low that the system fails to distinguish the fidelity of recognition. Mismatched hybridization may be a problem 
2n ,h y Z T Sequencin 9 a PP»«>ti°ns. though the fingerprinting and mapping applications may no. be so 
stnct in the.r fidel.ty requ.rements. As indicated above, a thirty position binary sequence has over a million possible 

SnhT', 3 ""I 1 ! ^ S ,! arlS 10 beC ° me unreasonab| y lar 9 e in »s required number of different sequences, even 
though the arget length ,s st.H very short. Preparing a substrate with all sequence possibilities for a long target may 

be extremely difficult due to the many different oligomers which must be synthesized 

!!,!!? Tbe ab ° ve e * ample illus,ra,es now 3 ,0 "9 target sequence may be reconstructed with a reasonably small 

1™ h K Se ,T e r S SinCe PreSen ' d3y reSO,U,i ° n ° f lhe re 9 fons of substrate having defined 

ohgomer probes attached to the substrate approaches about 10 microns by 10 microns for reso.vable regions, about 

lljZ J " l, l ° n, H P ° S ' tl0nS ( Can be u P ,aced on 3 one centimeter square substrate. However, high resolution systems may 
have part.cular disadvantages which may be outweighed using the lower density substrate matrix pattern For this 

7ZZZ^ TTV a T " Umber ° f Pr ° be SeqUenC6S Ca " be U,i,ized 50 tba ' ™y 9 iven ,ar 9 e < sequence may be 
determined by hybridization to a relatively small number of probes 

S . A SeC K° nd C ° mplicati0n relates 10 convergence of sequences to a single subsequence. This will occur when 
a particular subsequence is repeated in the target sequence. This problem can be addressed in at least two different 
Z!, » , k 3nd S ' mp,er way ' is 10 separate the re P eal sequences onto two different targets. Thus, each single 
llTvl h the repeated sequence and can be analyzed to its end. This solution, however, complicates the 

analysis by requir.ng that some means for cutting at a site between the repeats can be located. Typically a careful 
sequencer would wan. to have two intermediate cut points so tha. the intermediate region can also be sequenced in 
both directions across each of the cut points. This problem is inherent in the hybridization method for sequencing but 

mn«m T™ ? US ' n9 3 '° n9er k "° Wn pr ° be SeqU6nCe 80 that tne fre q ue "cy of P rob e repeats is decreased 
0080] Knowing he sequence of flanking sequences of the a repeat will simplify the use of polymerase chain reaction 

LS^k'^ , le ? k hn ' q ? ,0furtherdefinltive, y determine Re sequence between sequence repeats. Probes can be 
made to hybnd,ze to those known sequences adjacent the repeat sequences, thereby producing new target sequences 
for analysis^ See, e.g., Innisetal. (eds.) (1990) PGR Protocols: A Guide to Methods and Applications . Academic Press 
r P prorr d RV 0 presr o e x^d f °' i90nUC ' e0lide pr ° beS ' See ' eg - Gait < 1984 > Olig onucleotide Synthesis: A Pracir^i 
[0081] Other means for dealing with convergence problems include using particular longer probes and using de- 
generacy reducing analogues, see, e.g.. Macevicz, S. (1 990) PCT publication number WO 90/04652 By use of stretch- 

fofu rl . e9e . n T Cy redU l Ci " 9 ana '° 9UeS Wi,h ° ,her Pr0b6Sin Particular combinations, the numberof probes necessary 
to fully saturate the possible oligomer probes is decreased. For example, with a stretch of 12-mers having the central 

nil degen !f ! nuc, eo«des, in combination with all of the possible 8-mers. the collection numbers twice the 
posslTe° 2 P me^s ^ 65 ' 536 + 65 ' 536 = 131 '° 72 ' ^ ^ P ° PU,ati ° n Pr ° VideS SCre6nin9 equiva,ent ,0 a " 

[0082 N1-N2-N a 3 y N4 ^Smmm^' a " P0SSib ' e ° li90nUC,e0,ide 8 - mers ma y be de P icte d the fashion: 
in which there are 48 = 65.536 possible 8-mers. Producing all possible 8-mers requires 4x8 = 32 chemical binary 

nucleolus, D s, which hybnd,ze nonselectively to any corresponding complementary nucleotide, new oligonucleotides 
1 2-mers can be made in the fashion: 

N1-N2-N3-N4-D-D-D-D-N5-N6-N7-N8, 

in which there are again, as above, only 48 = 65,536 possible "12-mers", which in reality only have 8 different nucle- 

[0083] However, it can be seen that each possible 12-mer probe could be represented by a group of the two 8-mer 
types. Moreover, repeats of less than 12 nucleotides would not converge, or cause repeat problems in the ana.ysis 
Thus, instead of requiring a collection of probes corresponding to all 12-mers, or 412 = 16,777 216 different 12-mers' 
the same information can be derived by making 2 sets of "8-mers" consisting of the typical 8-mer collection of 48 = 
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65,536 and the "12-mer" set with the degeneracy reducing analogues, also requiring making 4 8 = 65,536. The combi- 
nation of the two sets, requires making 65,536 + 65,536 = 131,072 different molecules, but giving the information of 
16,777,216 molecules. Thus, incorporating the degeneracy reducing analogue decreases the number of molecules 
necessary to get 12-mer resolution by a factor of about 128-fold. 



III. POLYNUCLEOTIDE SEQUENCING 



[0084] In principle, the making of a substrate having a positionally defined matrix pattern of all possible oligonucle- 
otides of a given length involves a conceptually simple method of synthesizing each and every different possible oli- 
gonucleotide, and affixed to a definable position. Oligonucleotide synthesis is presently mechanized and enabled by 
current technology and instruments supplied by Applied Biosystems, Foster City, California. 

A. Preparation of Substrate Matrix 



[0085] The production of the collection of specific oligonucleotides used in polynucleotide sequencing may be pro- 
duced in at least two different ways. Present technology certainly allows production of ten nucleotide oligomers on a 
solid phase or other synthesizing system. See, e.g., instrumentation provided by Applied Biosystems, Foster City, 
California. Although a single oligonucleotide can be relatively easily made, a large collection of them would typically 
require a fairly large amount of time and investment. For example, there are 4 10 = 1,048,576 possible ten nucleotide 
oligomers. Present technology allows making each and every one of them in a separate purified form though such 
might be costly and laborious. 

[0086] Once the desired repertoire of possible oligomer sequences of a given length have been synthesized, this 
collection of reagents may be individually positionally attached to a substrate, thereby allowing a batchwise hybridiza- 
tion step. Present technology also would allow the possibility of attaching each and every one of these 10-mers to a 
separate specific position on a solid matrix. This attachment could be automated in any of a number of ways, particularly 
use of a caged biotin type linking. This would produce a matrix having each of different possible 10-mers'. 
[0087] A batchwise hybridization is much preferred because of its reproducibility and simplicity. An automated proc- 
ess of attaching various reagents to positionally defined sites on a substrate is provided in PCT publication no 
WO90/15070. and PCT publication no. WO91/07087. 

[0088] Instead of separate synthesis of each oligonucleotide, these oligonucleotides are conveniently synthesized 
in parallel by sequential synthetic processes on a defined matrix pattern as provided in PCT publication no. 
WO90/15070. Here, the oligonucleotides are synthesized stepwise on a substrate at positionally separate and defined 
positions. Use of photosensitive blocking reagents allows for defined sequences of synthetic steps over the surface of 
a matrix pattern. By use of the binary masking strategy, the surface of the substrate can be positioned to generate a 
desired pattern of regions, each having a defined sequence oligonucleotide synthesized and immobilized thereto. 
[0089] Although the prior art technology can be used to generate the desired repertoire of oligonucleotide probes, 
an efficient and cost effective means would be to use the VLSIPS technology described in PCT publication no. 
WO90/15070. In this embodiment, the photosensitive reagents involved in the production of such a matrix are described 
below. 

[0090] The regions for synthesis may be very small, usually less than about 100 \im'x 100 urn, more usually less 
than about 50 \im x 50 urn The photolithography technology allows synthetic regions of less than about 10 urn x 10 
about 3 u.m x 3 u.m, or less. The detection also may detect such sized regions, though larger areas are more easily 
and reliably measured. 

[0091] At a size of about 30 microns by 30 microns, one million regions would take about 11 centimeters square or 
a single wafer of about 4 centimeters by 4 centimeters. Thus the present technology provides for making a single matrix 
of that -size having all one million plus possible oligonucleotides. Region size are sufficiently small to correspond to 
densities of at least about 5 regions/cm 2 , 20 regions/cm*, 50 regions/cm 2 , 100 regions/cm* and greater, including 300 
regions/cm 2 , 1000 regions/cm 2 , 3K regions/cm 2 , 10K regions/cm 2 , 30K regions/cm 2 , 100K regions/cm 2 , 300K regions/ 
cm 2 or more, even in excess of one million regions/cm 2 . 

[0092] Although the pattern of the regions which contain specific sequences is theoretically not important, for practical 
reasons certain patterns will be preferred in synthesizing the oligonucleotides. Binary masking algorithms can be ap- 
plied to generate the pattern of known oligonucleotide probes. 

[0093] By use of these binary masks, a highly efficient means is provided for producing the substrate with the desired 
matrix pattern of different sequences. Although the binary masking strategy allows for the synthesis of all lengths of 
polymers, the strategy may be easily modified to provide only polymers of a given length. This is achieved by omitting 
steps where a subunit is not attached. 

[0094] The strategy for generating a specific pattern may take any of a number of different approaches. However, 
the binary masking and binary synthesis approaches provide a maximum of diversity with a minimum number of actual 
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synthetic steps. 

[0 ? 95 ?„ ^ 6 ' engU * °. f oli 9 onudeolides used in sequencing applications will be selected on criteria determined to some 
extent by the practical limits discussed above. For example, if probes are made as oligonucleotides, there will be 65 536 
possible eight nucleot.de sequences. If a nine subunit oligonucleotide is selected, there are 262,144 possible perme- 
ates of sequences. If a ten-mer oligonucleotide is selected, there are 1 ,048.576 possible permutations of sequences 
As the number gets larger, the required number of positional^ defined subunits necessary to saturate the possibilities 
also increases. With respect to hybridization conditions, the length of the matching necessary to converse stability of 
the conditions selected can be compensated for. See, e.g., Kanehisa, M. (1984) Nuc Acids Res 12 203-213 
[0096] Although not described in detail here, but below for oligonucleotide probes, the VLSIPS technology would 
yp.cally use a photosensitive protective group on an oligonucleotide. Sample oligonucleotides are shown in Figure 4 
In particular, the photoprotective group on the nucleotide molecules may be selected from a wide variety of positive 
hght reactive groups preferably including nitro aromatic compounds such as o-nitrobenzyl derivatives or benzylsulfonyl 
See e.g.. Gait (1984) Oligonucleotide Synthesis: A Practical Approach. IRL Press, Oxford. In a preferred embodiment,' 

f;"'^ ? ° D X y carbon y ( NV0C >- 2 - ni ^en 2y loxycarbon y i(NBOC).ora.a-dimethyl-dim e thox y benz y loxycarbonyi 
(DEZ) is used ; Pho 0 ^rnovable protective groups are described in. e.g., Patchornik (1970) J. Amer. Chem. Soc. 92: 
6333; and Amit et al. (1974) J. Organic Chem. 39:192. 

I009 !? ^ r ferred ' inker iS USGd ,0 a,t3Ch me oli 9° nu cleotide to a silicon matrix. A more detailed description is 
provided below. A photosensitive blocked nucleotide may be attached to specific locations of unblocked prior cycles 

^„ a oo a , ri! ^1 SUbSlrate and Ca " be successivel V built up to the correct length oligonucleotide probe 
10098] It should be noted that multiple substrates may be simultaneously exposed to a single target sequence where 
each substrate is a duplicate of one another or where, in combination, multiple substrates together provide the complete 
or desired subset of possible subsequences. This provides the opportunity to overcome a limitation of the density of 
pos, ions on a single substrate by using multiple substrates. In the extreme case, each probe might be attached to a 
single bead or substrate and the beads sorted by whether there is a binding interaction. Those beads which do bind 
might be encoded to indicate the subsequence specificity of reagents attached thereto 

[0099] Then the target may be bound to the whole collection of beads and those beads that have appropriate specific 
reagents on them will bind to target. Then a sorting system may be utilized to sort those beads that actually bind the 
target from those that do not. This may be accomplished by presently available cell sorting devices or a similar appa- 
ratus^Aftertherelatrvely small number of beads which have bound thetarget have been collected, the encoding scheme 
may be read off to determine the specificity of the reagent on the bead. An encoding system may include a magnetic 
system, a shape encoding system, a color encoding system, or a combination of any of these, or any other encoding 
system. Once agam, w.th the collection of specific interactions that have occurred, the binding may be analyzed for 
sequence information, fingerprint information, or mapping information 

[0100] The parameters of polynucleotide sizes of both the probes and target sequences are determined by the ap- 

CT^^T^ ,en9 ' h ° f 0,i 9° nucleo « de P'°°es used will depend in part upon the limita- 
tions o the VLSIPS technology to provide the number of desired probes. For example, in an absolute sequencing 
application, it is often useful to have virtually all of the possible oligonucleotides of a given length. As indicated above 
there are 65,536 8-mers, 262.144 9-mers, 1.048,576 10-mers, 4,194,304 11-mers, etc. As the length of the oligomer 
increases the number of different probes which must be synthesized also increases at a rate of a factor of 4 for every 
additional nucleotide. Eventually the size of the matrix and the limitations in the resolution of regions in the matrix will 
reach the point where an increase in number of probes becomes disadvantageous. However, this sequencing proce- 
dure requires that the system be able to distinguish, by appropriate selection of hybridization and washing conditions 
between binding of absolute fidelity and binding of complementary sequences containing mismatches. On the other 
hand, if the fidelity ,s unnecessary, this discrimination is also unnecessary and a significantly longer probe may be 

HfSi, ^ V 96r Pr ° beS W0U ' d typiCa " y be USeful in ^gerprinting or mapping applications 
[0101] The length of the probe is selected for a length that it will bind with specificity to possible targets The hybrid- 
ize .on conditions are also very important in that they will determine how close the homology of complementary binding 
will be detected In fact, a single target may be evaluated at a number of different conditions to determine its spectrum 
of specificity forb.nd.ngparticularprobes. This may finduseinanumberof otherapplicationsbesidesthe polynucleotide 
sequencing fingerprinting or mapping. In a related fashion, different regions with reagents having differing affinities or 
levels of spec.f.c.ty may allow such a spectrum to be defined using a single incubation, where various regions at a 
given hybnd.zat.on condition, show the binding affinity. For example, fingerprint probes of various lengths or with 
spec, .c defined non-matches may be used. Unnatural nucleotides or nucleotides exhibiting modified specificity of 
complementary binding are described in greater detail in Macevicz (1990) PCT pub. No. WO 90/04652- and see the 
section on modified nucleotides in the Sigma Chemical Company catalogue. 
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B. Labeling Target Nucleotide 

[0102] The label used to detect the target sequences will be determined, in_part, by the detection methods being 
used meth ° d ' abel US6d Se,eC,ed ln combination with ,ne ac,ual steeling systems being 

[0103] Once a particular label has been selected, appropriate labeling protocols will be applied, as described below 
for specific embodiments. Standard labeling protocols for nucleic acids are described, e.g., in Sambrook et al • Kam- 
bara,H.etal.(1988) BioTe^^ 

S f e ' e - 9 ;^. ,en G ; (1989) Sequencing of Proteins and Peptides Elsevier, New York, especially chapter 5. and Green- 
stein and W.nitz (1 961 ) Chemistry of the Amino Acids, Wiley and Sons, New York. Carbohydrate labeling is described 
e g., ,n Chaplin and Kennedy (1986) Carbohydrate Anal ysis: A Practical Ap proach IRL Press, Oxford. Labeling of 
other polymers w.ll be performed by methods applicable to them as recognized by a person having ordinary skill in 
manipulating the corresponding polymer. 

[01 04] In some embodiments, the target need not actually be labeled if a means for detecting where interaction takes 
place » available. As described below, for a nucleic acid embodiment, such may be provided by an intercalating dye 
which intercalates only into double stranded segments, e.g., where interaction occurs. See, e.g., Sheldon et al U S 

Pat. No. 4,582,789. 

[0105] In many uses, the target sequence will be absolutely homogeneous, both with respect to the total sequence 
and with respect to the ends of each molecule. Homogeneity with respect to sequence is important to avoid ambiguity 
It ,s preferable that the target sequences of interest not be contaminated with a significant amount of labeled contanv 
inat.ng sequences. The extent of allowable contamination will depend on the sensitivity of the detection system and 
he inherent signal to noise of the system. Homogeneous contamination sequences will be particularly disruptive of 
the sequencing procedure. 

[0106] However, although the target polynucleotide must have a unique sequence, the target molecules need not 
have identical ends. In fact, the homogeneous target molecule preparation may be randomly sheared to increase the 
numerical number of molecules. Since the total information content remains the same, the shearing results only in a 
h.gher number of distinct sequences which may be labeled and bind to the probe. This fragmentation may give a vastly 
superior s.gnal relative to a preparation of the target molecules having homogeneous ends. The signal for the hybrid- 
ization is likely to be dependent on the numerical frequency of the target-probe interactions. If a sequence is individually 
found on a larger number of separate molecules a better signal will result. In fact, shearing a homogeneous preparation 
of the target may often be preferred before the labeling procedure is performed, thereby producing a large number of 
labeling groups associated with each subsequence. 

C. Hybridization Conditions 

[0107] The hybridization conditions between probe and target should be selected such that the specific recognition 
interaction, i.e., hybridization, of the two molecules is both sufficiently specific and sufficiently stable. See e g Hames 
and H.gg.ns (1985) Nucleic Acid Hybridisation: A Practical Approach IRL Press, Oxford. These conditions will be 
dependent both on the specific sequence and often on the guanine and cytosine (GC) content of the complementary 
hybnd strands. The conditions may often be selected to be universally equally stable independent of the specific se- 
quences involved. This typically will make use of a reagent such as an arylammonium buffer. See Wood et al (1985) 
Base Composition-independent Hybridization in Tetramethylammonium Chloride: A Method for Oligonucleotide 
Screening of Highly Complex Gene Libraries," Proc. Natl. Acad. Sci. USA . 82:1585-1588; and Krupov et al (1989) "An 
Oligonucleotide Hybridization Approach to DNA Sequencing," FEBS Letters. 256:118-122. An arylammonium buffer 
ends to minimize differences in hybridization rate and stability due to GC content. By virtue of the fact that sequences 
hen hybridize with approximately equal affinity and stability, there is relatively little bias in strength or kinetics of binding 
for particular sequences. Temperature and salt conditions along with other buffer parameters should be selected such 
that the kinetics of renaturation should be essentially independent of the specific target subsequence or oligonucleotide 
probe involved. In order to ensure this, the hybridization reactions will usually be performed in a single incubation of 

fnio e , SU ^ tra,e m8triCeS t09ether eXP ° Sed l ° ,he identical same ,ar 9 et P robe solu «°n under the same conditions. 
[01 08] Alternatively, various substrates may be individually treated differently. Different substrates may be produced 
each having reagents which bind to target subsequences with substantially identical stabilities and kinetics of hybrid- 
ization. For example, all of the high GC content probes could be synthesized on a single substrate which is treated 
accordingly. In this embodiment, the arylammonium buffers could be unnecessary. Each substrate is then treated in a 
manner that the collection of substrates show essentially uniform binding and the hybridization data of target binding 
to the individual substrate matrix is combined with the data from other substrates to derive the necessary subsequence 
binding information. The hybridization conditions will usually be selected to be sufficiently specific that the fidelity of 
base matching will be properly discriminated. Of course, control hybridizations should be included to determine the 
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stringency and kinetics of hybridization. 
D. Detection; VLSIPS Scanning 

[01 09] The next step of the sequencing process by hybridization involves labeling of target polynucleotide molecules 
rZ'to fl eaS " y t f ectab,e Sl ' 9 " al is P"*"* VLSIPS apparatus is designed to eas d SEE ZSJnt 
" so fluorescent tagg.ng of the target sequence is preferred. Other suitable .abels include heavy metaTtebels 
magnetic probes chromogenic labels (e.g., phosphorescent labels, dyes, and fluorophores) specT^clltMstn 
zyrne hnked labe.s, radioactive labe.s. and .abe.ed binding protein, Additiona. .abefs « o£S^ S ^£ 

\°21?L The d * ecli °" meth o d s to determine where hybridization has taken place will typically depend upon the 

^SSmST- k US ' f ° r 3 f,UOreSCenl ,3be ' 3 f,UOrescen, de,ection sle P 'yP-cally be used P?T IbSon 
no WO90/1 5070 descnbes apparatus and mechanisms for scanning a substrate matrix using fluorescence deterf on 
but a s.m.lar apparatus is adaptable for other optically detectable labels ™orescence detection. 

[0111] The detection method provides a positional localization of the region where hybridization has taken place 

,k ' S C ° rrelated W,,h ,he SpeCifiC Sequence of ,he 'he probe has specScS.y been ^ 

tached or synthesized at a defined substrate matrix position. Having collected all of the data indicate ^th" subsequenc 

hZ 2 L IL' 5 a,S ° P0SSib ' e ^ diSPenSG With aCtUa ' ' abeling if Some means for de,e cting the positions of interaction 
between the sequence spec.fic reagent and the target molecule are available. This may take the form of an addH.onS 
reagent wh.ch can indicate the sites either of interaction, or the sites of lack of interac on. e.g. aTga Z fabe To 
aC,d * mb ° d ~. 'options of double strand interaction may be detected by the fncorpoX ton of interca 
SA^TSS^ an,ib ° dy ° r °' her rea9en,S « ™* formation, see. e.g., ShelT, 

E. Analysis 

hp nL T°T ,h f/ econs , ,ruc,ion can be P erfo "™ d manually as illustrated above, a computer program will typically 
be used to perform the overtap analysis. A program may be written and run on any of a large number c f 2Kent 
co^rhardwaresys.ems.Thevarietyofoperat^^ 

F. Substrate Reuse 

iubsleTh^ h3S T hybridi26d and ,he ^ hybridization analyzed, the matrix 

suDstrate should be reusable and read.ly prepared for exposure to a second or subsequent target polynucleotides In 

, 0 ara e e r , T ^ ***** 8nd ™«* ^ in 3 -moves a. r^^^ 

and the hnkages to the substrate are inert. This treatment may include an elevated temperature LaS Sneni 

r IZIT ° r ; n0r9an,C T 6 "' 5, modifica,ions '» PH. and other means for disrupting specific !Z S ller^He 
a second target may actually be applied to the recycled matrix and analyzed as before Thereafter. 

IV. FINGERPRINTING 



A. General 



or JnLmr 7 ,-^7 8nd ,echnioues used in ™e polynucleotide sequencing section are also appropriate 
Si ^ m r D See ' P ° US,ka - * a ' (19»6) Co.d Spring Harbor sJLd, . n.-H ^TT.? 

llmlf' SP m f Harb0r Pres ?' New York - fingerprinting method provided herein is based, in part upon, he 
ab lytopos^onaHylocahzea.argenumberofdifferentspecificprobesontoasing.e substrate. This high dens^y ma nx 
pattern prov.des the abrfity to screen for. or detect, a very large number of different sequences simlt ous Tfact 
depending upon the hybnd.zat.on conditions, fingerprinting to the resolution of virtually absolute matching of eouence 
-s possible thereby approaching an absolute sequencing embodiment. And the sequencing embodiment b vl* 
,n .denting the probes useful in further fingerprinting uses. For examp.e. characteristic features of gene, c sequences 
wri be .denied I as being diagnostic of ,he entire sequence. However, in most embodiments, to^S^^i 
w,ll be used, and for which slight mismatching may not need to be resolved. 9 
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B. Preparation of Substrate Matrix 

[01 16] A collection of specific probes may be produced by either of the methods described above in the section on 

rr 0 «r< p : °%r? e T probes of desired ,eng,hs may be ***** «» a JSsjsj 

nuc eot.de synthes,zer. The length of these probes is limited only by the length of the ability of the synthesize Mo 
continue to accurately synthesize a molecule. Oligonucleotides or sequence fragments may also be isolated 

Ho^ 

[0117] In one embodiment the individually isolated probes may be attached to the matrix at defined positions These 

( lal uT T iT ?™ JS? bV " aUt ° ma,ed Pr0C6SS USi " 9 P hot -"emica. reagents, see, e.g.. Sa^^a? 
(1985) U.S. Pat. No^4 542.102 and (1987) U.S. Pat. No. 4.713,326. Each individual purified reagent can be attached 
individually at specific locations on a substrate. anacnea 

at ^l Jr^T r emb0dim K en, • VLS ' PS s y n,hesi2i "9 « ec " W may be used to synthesize the desired probes 
at specific positions on a substrate. The probes may be synthesized by successively adding appropriate monomer 
subunits, e.g.. nucleotides, to generate the desired sequences 9 appropriate monomer 

for nnlmi n n a ^ t H he^e , ^nb0 , d K imen, • ° relative,y short specific oligonucleotide is used which serves as a targeting reagent 
US hht ■ '" 9 S6qUenCe reC ° 9ni,i ° n rea9ent - F ° r eXam P' e ' the se < uence ^cific reagents having a 

ZrZl u , . T° 9m ° n SG9ment (USUa " y ° f 3 different P 0 ^ from 'argel sequence) can be 

directed to target oligonucleotides attached to the substrate. By use of non-natural targeting reagents e g untsuS 
nuc eo tide analogues which pair with other unnatural nucleotide analogues and which do not interfere ^witn natura 

retTndt^uaTSio 3, I? ^ ^ ™ ^ °" ,he Sa ™ ™ ,ecu,e without ilTring w m 

LZuTwln , i f C ° mb,ne b ° ,h 3 Synth6,iC 3nd bi0l °9 ical P roduction s y*™ analogous To the 

techniquefortarget^gmonoclonalantibodies.olocationsonaVLSIPSsubstra.eatdefinedposLns UnnaLal optical 

[0120] After the separate substrate attached reagents are attached to the targeting segment the two are crosslinked 
thereby permanency attaching them to the substrate. Suitable crosslinking reagents are known see e g Da.tuS 

-ts ( Paf^ 

(1 986) |^ 232 3 41 3965 31131 ° f Pr °' einS '° 8 S °' id 8Ub * ale ^ P ' M ' in Merrifie,d 

C. Labeling Target Nucleotides 

[0121] The labeling procedures used in the sequencing embodiments will also be applicable in the finqerprintina 
embodiments^ However, since the fingerprinting embodiments often wi.l involve relatively large target mSclTand 

lels crhiral ih ^T^* ^ ^ am ° Unt ° f Signa ' nec88 » a * to incor P° ra,e into the Lge. seque ce may e 
SEJS?^"! T^? aPP, , iCa,i ° nS - F0r exam P ,e ' a '°"9 target with a relatively small number of 

abels per molecule may be eas.ly amplified or detected because of the relatively large targe, molecule size. 
[0122] In various embodiments, it may be desired to cleave the target into smaller segments as in the sequencina 

and cieava9e ,ech — in - *~ 

D. Hybridization Conditions 

[0123] The hybridization conditions used in fingerprinting embodiments will typically be less critical than for the se- 
quencing embodiments. The reason is that the amount of mismatching which may be useful in providing he finger- 
printing information wou.d typically be far greater than that necessary in sequencing uses. For exampTe Southern 

ZSS^T. ,yP,Ca " y diS,in9UiSh b6,Ween S,i9h,,y miSma,Ch6d Se « uences - ^der these circu^ce I 
fTe^ZlT TT 0 " ^ ^ 31 W " h ,6SS S,fingent conditions while providing valuable 

[hp h LT 9 rr ? f I 0 "" T Ver ' S ' nCe lhe 6n,ire SUbS,ra,e is ,ypical| y ex P° sed 10 ,he lar 9e. molecule at one time 
the bmding affinity of the probes should usually be of approximately comparable levels. For "his reason if oligonude 

IdTnswhTh 9 USed> th f e,r ' en9,hS Sh ° U,d be a PP~ ,e| y com P-able and will be selected to hybridiz'e under 

and o aon^rH 6 T™ " ^ ^ ° n SUbS ' rate - Much as in a Sou,her " the target 

and ohgonuc.eot de probes are of lengths typically greater than about 25 nucleotides. Under appropriate hybridization 
conditions e.g.. typically higher salt and lower temperature, the probes wi„ hybridize irrespective oLperilTcZT- 

[0124] Typically the fingerprinting is merely for probing similarity or homology. Thus, the stringency of hybridization 
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can usually be decreased to fairly low levels. See. e.g., Wetmur and Davidson (1968) "Kinetics of Renaturation of DNA 
J. Mol. Biol., 31:349-370; and Kanehisa, M. (1984) Nuc. Acids Res. . 12:203-213. 

E. Detection; VLSIPS Scanning 

[0125] Detection methods will be selected which are appropriate for the selected label. The scanning device need 
not necessarily be digitized or placed into a specific digital database, though such would most likely be done For 
example, the analysis in fingerprinting could be photographic. Where a standardized fingerprint substrate matrix is 
used, the pattern of hybridizations may be spatially unique and may be compared photographically. In this manner 
each sample may have a characteristic pattern of interactions and the likelihood of identical patterns will preferably be 
such low frequency that the fingerprint pattern indeed becomes a characteristic pattern virtually as unique as an indi- 
v.duals fingertip fingerprint. With a standardized substrate, every individual could be, in theory, uniquely identifiable 
on the basis of the pattern of hybridizing to the substrate. 

[0126] Of course, the VLSIPS scanning apparatus may also be useful to generate a digitized version of the fingerprint 
pattern. In th.s way. the identification pattern can be provided in a linear string of digits. This sequence could also be 
used for a standardized identification system providing significant useful medical transferability of specific data In one 
embodiment the probes used are selected to be of sufficiently high resolution to measure polynucleotides encoding 
antigens of the major histocompatibility complex, it might even be possible to provide transplantation matching data 
in a linear stream of data. The fingerprinting data may provide a condensed version, or summary, of the linear genetic 
data, or any other information data base. 

F. Analysis 

[01 27] The analysis of the fingerprint will often be much simpler than a total sequence determination However there 
may be particular types of analysis which will be substantially simplified by a selected group of probes For example 
probes wh.ch exhibit particular populational heterogeneity may be selected. In this way, analysis may be simplified and 
practical utility enhanced merely by careful selection of the specific probes and a careful matrix layout of those probes. 

G. Substrate Reuse 

[0128] As with the sequencing application, the fingerprinting usages may also take advantage of the reusability of 
the substrate. In this way. the interactions can be disrupted, the substrate treated, and the renewed substrate is equiv- 
alent to an unused substrate. 

H. Other Polynucleotide Aspects 

[0129] Besides using the fingerprinting method for analyzing the structure of a particular polynucleotide the finger- 
printing method may be used to characterize various samples. For example, a cell or population of cells may be tested 
for their expression of particular mRNA sequences, or for patterns of expressed mRNA species. This may be applicable 
m?™? °om * 8Ue T' l ° expressed me ssenger RNA population expressed by a cell to the genetic content of a cell. 
[0130] RNA can be isolated from a cell or a cell population, such as a purified cell fraction or a biopsy sample The 
RNA may be labeled, for example by attaching a fluorescent molecule to isolated RNA or by using radiolabeled RNA 
(e.g.. end-labeled with T4 polynucleotide kinase). A VLSIPS substrate containing positionally discrete oligonucleotide 
sequences may then be exposed to the pool of labeled RNA species under conditions permitting specific hybridization 
The Pattern of positions at which labeled RNA has formed specific hybrids may be compared to a reference pattern to 
identify, and in some embodiments quantify, the expressed RNA species, or to identify the hybridization pattern itself 
as being characteristic of a particular cell type. 

[0131] For example but not for limitation, a VLSIPS oligonucleotide substrate may be hybridized to a labeled RNA 
sample obtained from a first cell type (e.g., human lymphocytes) to establish a reference hybridization pattern for the 
first cell type. Similarly, an identical VLSIPS oligonucleotide substrate may be hybridized to a labeled RNA sample 
obtained from a second cell type (e.g., human monocytes) to establish a reference hybridization pattern for the second 
cell type. Labeled RNA may then be prepared from a cell or a cell population and hybridized to an identical VLSIPS 
ol.gonucleot.de substrate, and the resultant hybridization pattern can be compared to the reference hybridization pat- . 
terns established for the first and second-cell types. By such comparisons, the RNA expression pattern of a cell or cell 
population can be identified as being similar to or distinct from one or more reference hybridization patterns 
[0132] Where a positionally discrete oligonucleotide on the VLSIPS substrate is in molar excess over the amount of 
he cognate (complementary) labeled RNA species in the hybridization reaction, the amount of specific hybridization 
to that VLSIPS locus (as measured by labeling intensity at that locus) can provide a quantitative measurement of the 
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Si «f h , S T ,eS PreS ! nt ,n ,abe,edRNA sam P |e - ^us. hybridization of labeled RNA to a VLSIPS olioonu- 
cleot de subs rate can prov,de information identifying the individual RNA species that are expressed in a pan^cuLr cell 
or cel. popu.at.on. as we as the relative abundance of one or more individual RNA species. This inforl.tonTan serve 
to fingerpnnt specific cell types or particular stages in cell differentiation 

[01 l 31 ■ k F T ! Xa Tu t bUl " 0t f ° r limitati ° n ' RNA Samples prepared from tissue biopsies, specifically tumor biopsies 
can be labeled and hybndized to a VLSIPS oligonucleotide substrate, and the ^r!h^Llk»^hZSnSS2 
inflation regarding cell type, degree of differentiation, and metastic potential (malignancy). Some of the posiSy 

< T c Tr h S ma , y T fidiZe SPeCifiCa " y WUh RNA SpeCi6S tranSC " bed f ™ i-ous prToncoge s 

01341 Tadd ^lT' Wh ' C r are ' " ^ at e,evated ,evels ■» ^pLiic tissues, 

v/ c?o? ? dia 9nost IC appl.cat.ons. labeled RNA samples from various neoplastic cell types may be hybrid 

obtained w^h RnTZ^ *» reSU ' ,ant hybridiza « on pattem < s > <- pa - d ^ITZTJTas 

obta.ned w.th RNA from related. non-neoplast.c cell types. Identification of distinctions between the hybridization pat- 
terns obtained with RNA from neoplastic cells as compared to patterns obtained from RNA from noJ££T«£ 
may be of d.agnostic value and may identify RNA species that encode proteins that are potential targets fo novd 
merapeut,c modahties. In fact, the high reso.ution of the test will allow more comp.ete characterization of pa amete* 
which define part.culard.seases.Thus.thepowerofdiagnostictests may be limited by the extent of statistical co^la«on 
rhemTnst^^ 

the means to generate th.s large un,ver Se of poss.ble reagents and the ability to actually accumulate that correlative 

[0135] For fingerprinting of RNA expression patterns, the VLSIPS substrate polynucleotides will be at least 12 nu- 
c.eo.,des .n length, preferably at .east 15 nucleotides in length, more preferably at least 25 nucleotides in tenet The 
seque n cesofthe P os.t,onal. y dis.inctpolynuc.eotid e son the VLSIPS substrate may be selected from pub, sh^sourles 
LdoT" , m n °' ' imited ,0 Compulerized da '^ase such as GenBank, and may or m no, in dude 

Z S 1 h eXPreS !,'° n Pa,,emS Wi " ,yPiCa " y empl ° y hi 9 h - s 'ri"9ency washes so as to provide hybridization patterns 
ha. ref ect predominantly specific hybridization. However, some nonspecific hybridization and/or cross-hybridS on 
to slightly m .sma ched sequences may be tolerated, and in some embodiments may be desirable. 

^ia^iZlSrr* 3 ^ denSi,V meanS SCreenin9 the PreS6nCe ° r abS6nce of specific interactions 
allows for the possibility of screening for. if not saturating, all of a very large number of possible interactions This is 

«Tt exaZe T *' teSUn9 "* C ° mbinati ° nS ° f motecute ' prop ^s -hich can dTe a c'ass o 

samples. For example, a spec.es of organ.sm may be characterized by its DNA sequences, e.g.. a genetic fingerprint 
By u S .ngafinger P r,nt.ngme.hod.i, may be determined that all members of that specL are sufficientty sJ^SSo 

rZllZZT^ r eaSi ' y identifi6d " b6in9 Wi,hi " 3 PartiCU,ar 9rOUp " Thus ' de "" ed ^ -The 
resolved by their s.m.lanty ,n fingerprint patterns. Alternatively, a non-member of that group will fail to share those manv 

*TZ£?^n , H0WeVe :: *T ,eChn ° ,09y a " 0WS ° f a ™* ^ ^ <« specie ilTcS 

t also provides the ab. hty to more finely distinguish between closely related different cells or samples. This will have 
mportan applications ,n diagnosing viral, bacterial, and other pa.hologica. on nonpathologica, infections. 
[0137] In part.cular, cell class.fication may be defined by any of a number of different properties For example a cell 

For examlte X T ^T* C ° ntained ,her6in - ™ S a "° WS SP6deS ide " tifiCati °" paras " c " 
hum^n rp... w P " Ce " ' S presumab| y 9 enetical| y distinguishable from a monkey cell, but different 

sequent * na cznl^ ^ 7*7" ^ reSOlU " 0n ' Mch indiVidUa ' hUma " ^ wi " unique 
sequences that can define it as a single individual. 

[0138] Likewise, a developmental stage of a cell type may be definable by its pattern of expression of messenqer 
RWL For example. ,n particu.ar stages of ce..s. high levels of ribosoma. RNA are found whereas re.a^e yTw Ievets 

££ZT* ^Tr ger ^ ^ f ° Und - The W9h reS0luti0 " ^tinguishability provided by this fin^ inting 
method allows the demotion between cells which have re.ative.y minor differences in its expressed mRNA poS on 

^pretr 8 OWn, ° teCha 

!or 1 ?rn ll l,'rn a r ther 3 subs,ra,e as Prided herein may be used for genetic screening. This would allow 

1Z 21 k SCre6 -? ,n9 h ° USandS ° f 96ne,iC mark6rS - * the dens «y of the matri * is many more 

nr^H T ^, S ;" 1U,,aneOUSIy ,eSted - Gene " C SCreeni " 9 then a ^ ^.hod as the present invention 
provides the ability to screen for thousands, tens of thousands, and hundreds of thousands, even mH.ions of different 

hundred<f T ST' 1 *" ^ ° f hi9 " C ° rrelati ° n ^ *r conditions numbers on^n he 

21 whi h 9 ' T V SCreenin9 3 ,arQe nUmb6r ° f Sequences provides ,he °PP^unity for gene Jng the 
TZT ,h Pr0V ', COrre,a "° n b6,Ween SeQUenCeS 3nd SpedfiC COnditions or susceptibi.ity. The present invention 

tafion ^HinnT 3 "^ 0 9 , enera J e eXlreme ' y Va ' Uab,e C ° rrelati0nS USeful for the 9 ene « c d e«ec«ion of the causative mu 
tation leading to medical conditions, .n still another embodiment, the present invention would be applicable to distin- 
guishing two mdMduals having identical genetic compositions. The antibody population within an indiLa, fs de P e d- 
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ent both on genetic and historical factors. Each individual experiences a unique exposure to-various infectious agents 

ZJS^^jT^tt^ is partly determlned thereby - Thus > '" div ^ a,s ™* -» be 

be useful for fingerprinting, perhaps in combination with other screening properties 

[0140] With the definition of new classes of cells, a cell sorter will be used to purify them. Moreover new markers 
for defining that class of calls will be identified. For examp.e, where the Cass is defined by its RNA co entlTs mav 
be screened by ant.sense probes which detect the presence or absence of specific seances XteTeT^eZS 
cell lysates may provide information useful in correlating intracellular properties with eLceMaZaZTlw^' 
cate functional di^ 

ecogn.zes the interna, presence of the specific sequences of interest, the cel. sorter will be J!SSS^SZ 
homogene 0 us P opu.at l onofce..spo 

?jf * f ° r 061,8 haVi " 9 3 combination of a ""mber of different markers 
mni fingerprinted method as in identification means arises from mosaism problems in an organism A 

Z'h h 9an ' SrT ! ,S °" e , Wh f 6 9ene,iC COntenl in d,fferent CellS is si 9" ifica "«y Afferent V ari o U s IZ ZJZns 
shou.d haye similar genetic fingerprints, though different clonal populations may have different genetic contents See 

SST?' « I et , a A "'" l ^"^n to Genetic Analysis (4th Ed.). Freeman and Co.. New Yo IZZ Z 

™« inVenti °. n alS ° USG m de,eCtin9 GhangeS ' botn 9 enetic and in P«*ein expression (i e by RNA 
expression fingerpnntmg), ,n a rapidly "evolving" protozoan infection, or simHarly changing organism 

V. MAPPING 
A. General 

[0143] The use of the present invention for mapping parallels its use for fingerprinting and sequencing Maooina 
" Par,iCU,3r Se9mentS a ' 0ng ^ ,en9,h ° f ,he PO'y-cLtide The maop Z Z«Ss e 

app ^ 

'Tl T ? * st 1 fP proach is to lake lhe ,ar ge sequence and fragment it at specific points. The fragments are then 
ordered and attached to a solid substrate. For examp.e. the clones resuHing from a chromosome waS process mav 
be individually attached to the substrate by methods, e.g.. caged biotin techniques, indicated ea Her TgmentTof 
unknown map posit.on wi.l be exposed to the substrate and will hybridize to the segment which contains that papula 
sequence. Th.s procedure a.lows the rapid determination of a number of different .abe.ed i^^2S££ 

of the interaction, and the next mapping segment applied. aeneraiea D y removal 

[0145] In an alternative method, a plurality of subsequences can be attached to a substrate Various short probes 

oZLZZt Let 6 ™ 6 S T en,S C ° n,ain P3rtiCUlar ° VerlapS - The theoretical b-sJS fde cr ption 
of this mapping procedure ,s contained m. e.g., Evans et al. 1989 "Physical Mapping of Complex Genomes by Cosmid 

Mulfplex Analysis." Proc. Nat.. Acad. Sci. USA 86:5030-5034. and other references cited abLe in the Action fabe7ed 

ssn^^ approach - ,he de,ai,s of ,he mapping ™ are ~> - ™ 

B. Preparation of Substrate Matrix 

[0146] The substrate may be generated in either of the methods generally applicable in the sequencing and finger 

Z™:eT«zT^x?^ e may be made ei,her syn,he,ica,,y - ° r by a » achin ^ o^r^z a ;^ g s i 

k „ ^ P 0b6S ° r sec ' lJences ma V °e derived either from synthetic or biological means As 
indicated above, the sohd phase substrate synthetic methods may be utilized to generate a matrix wS posrtionat 
2S? SB TT S ; * maPPin9 emb0diment ' the ^ance of saturation of al.'possib.e sl^^l7a Zfl 
ected .ength ,s far .ess .mportant than in the sequencing embodiment, but the length of the probes used may be desired 
to be much longer. The processes for making a substrate which has longer oligonucleotide probes shou.d n be 
s,gn,fican ly different from those described for the sequencing embodiments but the optimization p«JS2jf ™ y be 
modified to comply with the mapping needs. M^meiers may oe 

C. Labeling 

[0147] The labeling methods will be similar to those applicable in sequencing and fingerprinting embodiments. Again. 
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the target sequences may be desired to be fragmented. 
P. Hybridization/Specific Interaction 



[0148] The specificity of interaction between the targets and probe would typically be closer to those used for finaer 

E. Detection 

[0149] The detection methods used in the mapping procedure will be virtually identical to those used in the fincer 
pnnt.ng embod.ment. The detection methods will be selected in combination with the labeling , method ' 

F. Analysis 

the relative pos.t.ons of d.fferent probes .s performed. This may be achieved by synthesis of the substrate in oat tern 
or may result from analysis of sequences after they have been attached to the substrate P ' 

[0151] For example, the probes may be randomly positioned at various locations on the substrate HowevPr th* 
; refct.ve pontons of the various reagents in the original polymer may be determined by uijSSlgZ? 
■nd.v.dual.y. as targe, mo ,ecu,es which determine the proximity of different probes. By an aZ^Z^S 

nr ^ln Zr^ ^ ? maPPi " 9, 35 deSCribed above in the fingerprinting section, the developmental map of a cell 

twm nil r 9 P ^ d,mension - Th e mapping or fingerprinting embodiments may also be used in 
TanoTr 9 rearran 9 emen,s which ™V*e genetically important, as in lymphocyte and B-ceH dev^oprnen" 

In another example, vanous rearrangements or chromosomal dislocations may be tested by either the fi^Sfinn 

G. Substrate Reuse 

^^^^^Lr T"? " th !, manner d6SCribed " ' he «"9^""ting section. The substrate is 
.rgersequences ' C "° nS "* " Wa8hed and Prepared for Successive ^ ° f TO to new 

VI. ADDITIONAL SCREENING AND APPLICATIONS 
A. Specific Interactions 

!!!!tL Pr0dUC,i ° n ° f 3 hi9h densi, y P' uralit y «* spatially segregated polymers provides the ability to generate a 

ZZ rZ7 VerSe h 0r T ° ire Qf indiVidUa " y and distincl se£ I uence Possibilities. As indicated above paSar ol.- 
gonucleot,des may be synthesized in automated fashion a, specific locations on a matrix. In fact 9^£^S£s 
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attaching each reagent at each desired position, the reagent may be attached to a specific desired complement^ 
[0155] 'n addition, the te^ 

t e o hgonucleo de sequence specificity of binding of a potential reagent may be tested by presenting to the reagent 
all of the possible subsequences available for binding. Although secondary or higher order sequence specific features 
m.ght no. be easily screenab.e using this technology, it does provide a convenient, simple, quick, and thorou h steen 
810 8lT 3 rea9ent " S ,ar9et reC ° 9ni,i0n Sec ' lJences - See - e -9- P ^ifer et al. (1989) SciLe 246: 

^ r a r amP ' k, T aCXi ° n ° f 3 Pr ° m0,er Pr0,ein With i,S ter9et bindin 9 se « uence ™V be «^ted for many 
Sll MK , " 9 SeqUenCeS " B * ,es,in 9 ,he st ™9* of interactions under various different conditions" 

If « f r K Pr0t T Wi,H ° f the ditferent P0,en,ial bindi "9 siles ' ma V be anal ^- The spectrum 
of strength of interactions wth each d,fferent potential binding site may provide significant insight into (he types of 
features which are important in determining specificity. 

EV n f ifa frT eo,asequencespKifcin,erac,im 

stranded nucleic acid structure with a single stranded oligonucleotide. Often, a triple stranded structure is produced 
wh.ch has significant aspects of sequence specificity. Testing of such interactions with either sequences comprising 
only ^natural "udeotides, or perhaps the testing of nuc.eotideana.ogs may be very important in screen^ 
encls Sn ^ ' rea9e " tS - ^ ^ ^ ° Bnan ( 1990 )Biochemi^ 29:9761-6765 and refer- 

B. Sequence Comparisons 

[0158] Once a gene is sequenced, the present invention provides means to compare alleles or related sequences 
to locate and identify differences from the.contro, sequence. This would be extremely useful in further na^ o 

genetic variability at a specific gene locus. wy*** 

C. Categorizations 

T nd l Ca,ed a L° V f finger P rintin 9 and ma PP j "9 embodiments, the present invention is also useful to 
For example, the developmental stage of a cel.. or population of cells, can be dependent upon the expression of 

T2T27 e T? mS - SCreenin9 Pr ° CedUreS Pr ° Vided a " OW for h '9 h resoluti °" *»«^n of new'dasTes o 
cells. In addition the temporal development of particular cells will be characterized by the presence or expression of 
venous mRNAs. Means to simultaneously screen a plurality or very large number of different sequences as provided 

ITJZul I m8rkerS made aVaHab,e drama,ical| y i^ses the ability to distinguish fairly closely re- 

la ed ce.. types. Other markers may be combined with markers and methods made available herein to define new 
classifications of b.ological samples, e.g., based upon new combinations of markers 

Itani! o^ 6 ST 5 !" 06 ° r 3b f T & ° f P3rtiCUlar mark6r se£ I uences wi » be used to define temporal developmental 
stagey Once the stages are defined, fairly simple methods can be app.ied to actually purify those particular cells For 
example, anfisense probes or recognition reagents may be used with a cell sorter to select those cells containingor 
expressing the critical markers. Alternatively, the expression of those sequences may result in specific antigens which 

Z if 7 S ■ T'" 9 °? C ' aSSeS and SOrtin9 ,h ° Se Ce,,S aWa * from others " lhis - v. 'or example, it should 
be possible to select a class of omnipotent immune system cells which are able to completely regenerate a human 

classes of cells havmg .dentifiable differences in RNA expression and/or DNA structure are made available 

r J, < Lr a " a " emat ! Ve embpd^ent. subclasses of T-cells are defined, in part, upon the combination of expressed 

r! sn a rtrr^ eC th Th tT ent ' nVenti0n a " 0WS f0r ,he ^ultaneous screening of a large plurality of different 
RNA spec.es together. Thus, higher resolution classification of different T-cell subclasses becomes possible and with 

vlttr° nS f " n h c 1 " ona ' differences w hich correlate with those other parameters, the ability to purify those cell 
types becomes available This ,s applicable not only to T-ce.ls. lymphocyte cel.s. or even to freely circulating cells 
Many o the ce.ls for wh,ch this wou.d be most useful will be immobi.e cells found in particular tissues or organs Tumor 
d^Pinl <T ° r USi " 9 ,h6Se fin 9 er P rintin 9 techniques. Coupled with a temporal change in structure, 

fithT . .Th T 3 S ° ^ Se ' eC,ed defi0ed USin9 lh6Se ^"o'oa'es. The present invention also provides 
the ab ty no. only to define new classes of cel.s based upon functional or structural differences, but i, also provides 

lr RNA 1° 'I ° f P h Ur ' fy , popu,a !! 0nS ° f Ce " S Wh,Ch Share ,h6Se parUcular P r °P erties - ln P articula ^. antisense DNA 
or RNA molecules may be introduced into a cell to detect RNA sequences therein. See. e.g., Weintraub (1990) Scientific 
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American 262:40-46. 



D. Statistical Correlations 
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[0162] In an additional embodiment, the present invention also allows for the high resolution correlation of medical 
conditions with various different markers. For example, the present technology, when applied to amniocentesis or other 
genetic screen.ng methods, typically screen for tens of different markers at most. The present invention allows simul- 
taneous screening for tens, hundreds, thousands, tens of thousands, hundreds of thousands, and even millions of 
different genetic sequences. Thus, applying the fingerprinting methods of the present invention to a sufficiently large 
population allows detailed statistical analysis to be made, thereby correlating particular medical conditions with partic- 
ular markers, typically genetic markers or pathognomonic RNA expression patterns. Tumor-specific RNA expression 
patterns and particular RNA species characterizing various neoplastic phenotypes will be identified using the present 
invention. 

[0163] Various medical conditions may be correlated against an enormous data base of the sequences within an 
individual. Genetic propensities and correlations then become available and high resolution genetic predictability and 
correlation become much more easily performed. With the enormous data base, the reliability of the predictions also 

?! 3r markerS WhiCh 3re Partia " y dia 9 nos "' c of P art ^ular medical conditions or medical suscepti- 

bilities will be .dentified and provide direction in further studies and more careful analysis of the markers involved Of 
course, as indicated above in the sequencing embodiment, the present invention will find much use in intense sequenc- 
ing projects. For example, sequencing of the entire human genome in the human genome project will be greatly sim- 
plified and enabled by the present invention. 

VI. FORMATION OF SUBSTRATE 

25 [0164] The substrate is provided with a pattern of specific reagents which are positional* localized on the surface 
of the substrate. This matrix of positions is defined by the automated system which produces the substrate The in- 
strument will typically be one similar to that described in PCT publication no. WO90/15070. The instrumentation de- 
scribed therein is directly applicable to the applications used here. In particular, the apparatus comprises a substrate 
typically a silicon containing substrate, on which positions on the surface may be defined by a coordinate system of 
positions. These positions can be individually addressed or detected by the VLSIPS apparatus 
[0165] Typically, the VLSIPS apparatus uses optical methods used in semiconductor fabrication applications In this 
way, masks may be used to photo-activate positions for attachment or synthesis of specific sequences on the substrate 
These manipulations may be automated by the types of apparatus described in PCT publication no WO90/15070 
0166] Selectively removable protecting groups allow creation of well defined areas of substrate surface having dif- 
fering reactivities. Preferably, the protecting groups are selectively removed from the surface by applying a specific 
activator, such as electromagnetic radiation of a specific wavelength and intensity. More preferably, the specific activator 
exposes selected areas of surface to remove the protecting groups in the exposed areas. 

[0167] Protecting groups of the present invention are used in conjunction with solid phase oligonucleotide syntheses 
using deoxyribonucleic and ribonucleic acids. In addition to protecting the substrate surface from unwanted reaction 
the protecting groups block a reactive end of the monomer to prevent self-polymerization 

[01 68] Attachment of a protecting group to the 5-hydroxyl group of a nucleoside during synthesis using for example 
phosphate-tnester coupling chemistry, prevents the 5--hydroxyl of one nucleoside from reacting with the 3'-activated 
phosphate-triester of another. 

[01 69] Regardless of the specific use, protecting groups are employed to protect a moiety on a molecule from reacting 
with another reagent. Protecting groups of the present invention have the following characteristics: they prevent se- 
eded reagents from modifying the group to which they are attached; they are stable (that is. they remain attached) to 
the synthesis reaction conditions; they are removable under conditions that do not adversely affect the remaining 
structure; and once removed, do not react appreciably with the surface or surface-bound oligonucleotide 
[0170] In a preferred embodiment, the protecting groups will be photoactivatable. The properties and uses of pho- 

M r Q«o?I» ^° Q ,e o C v n n9 D C0 T P0UndS haVe bee " reVi6Wed See ' McCray $ 5l- Ann. Rev, of Biophvs. and Biophys. Chem. 
1 989) 1^:239-270. Preferably, the photosensitive protecting groups will be removable by radiation in the ultraviolet 
(UV) or visible portion of the electromagnetic spectrum. More preferably, the protecting groups will be removable by 
rad.at.on in the near UV or visible portion of the spectrum. In some embodiments, however, activation may be performed 
by other methods such as localized heating, electron beam lithography, laser pumping, oxidalion or reduction with 
microelectrodes, and the like. Sulfonyl compounds are suitable reactive groups for electron beam lithography Oxidative 
or reductive removal is accomplished by exposure of the protecting group to an electric current source preferably 
using microelectrodes directed to the predefined regions of the surface which are desired for activation 
[0171] The density of reagents attached to a silicon substrate may be varied by standard procedures The surface 
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area for attachment of reagents may be increased by modifying the silicon surface. For example, a matte surface may 
be machined or etched on the substrate to provide more sites for attachment of the par.icu.aV reagente AnotS way 
to .ncrease .he density of reagent binding sites is to increase the denization density of the silicon. Standard ^ 
dures for achieving this are described, below. oidnaara proce- 

at high dens.ty. The substrate ,s then pho.olyzed for various predetermined times, which photoactivate the groups at 
a measurable rate ^and react then with a capping reagent. By this method, the density of linker groups maybemoSed 
by using a desired time and intensity of photoactivation. moouiatea 

S ,'lTZ aPP k f T' ,he " umber of differen « se 0 uences w " ! <* "«y be provided may be limited by the density 

hi„h ,„ ^ * S 7T h ' Ch PaUem iS 9enerated - ,n si,uations where *■ ^nsit y is insufficient^ 

high to al tow the screenmg of the desired number of sequences, multiple substrates may be used to increase 7e 
number of sequences tested. Thus, the number of sequences tested may be increased by using a plura^Me enl 
ead Ho T«l T ?• ^ VLS ' PS K a PP ara,us is a,m ° s « *"* automated, increasing the number of substrates does no 
ead to a s.gn.ficant increase in the number of manipulations which must be performed by humans This again leads 
to greater reproducibility and speed in the handling of these multiple substrates. ' n.s again leads 



A. Instrumentation 



0174] | Theconcept of us.ngVLSIPSgeneralfy allows a pattern or a matrix of reagents to be generated. The procedure 
or makmg ^the pattern m performed byany of a number of different methods. An apparatus and instmmen.a.rn useM 
for generating a h.gh density VLSIPS substrate is described in detail in PCT publication no. WO90/15070. 

B. Binary Masking 

lectS JlT^' ^,7 maSkin9 ,eChniQUe a " 0WS f0f Pr ° dUCin9 3 P,ura,i, y of ^^es based on the se- 
lection of e,.he of two poss.bHit.es at any particular location. By a series of binary masking steps, the binary decision 

sTunt R f e ; mina " 0n - °" aparUcular synlhelic cjide, whether or no. to add any particular one of .he pos " 
subum.s. By .reat.ng various regions of the matrix pattern in paral.e., the binary masking strategy provides the Sty 
to carry out spatially addressable parallel synthesis. y 

C. Synthetic Methods 

r [ p!I!L T !T cons ^ 

ST^IZZ TV * 09 T iC ° PtiCa ' me,h0dS ' PartiCU ' ar ^ Se9men,S ° f the substrate can be hBdlalad with 
Nghttoac.,va.eorde ac .,vateblock,ngagents.e.g., to protect or deprotect particular chemical groups. By an appropriate 

ph h 0, °- ex P° sure s , te P s at appropriate times with appropriate masks and with appropriate reagents, the 
substrates can have known polymers synthesized at posi.ionally defined regions on the substrate. Methods for syn- 
thesiz.ng various substrates are described in PCT publication no. WO90/15070. By a sequential series of these photo- 
rSZZ 9 , reaC I t,0 i n D mani P ulalions - a defi "ed matrix pattern of known sequences may be generated, and is typically 
1,7, T , nUC ' eiC add Synth6SiS em >°*™nt. nucleosides used in .he synthesis of 

DNA by photolyt,c methods will typically be one of the two forms shown below: 




I 
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II 

[0177] B = Adenine, Cytosine, Guanine, or Thymine 

Lvnr / ■? lhe , ph 1 ° t0labil ^ gro "P at ,he 5 ' P° sition is abbreviated NV (nitroveratryl) and in II. the group is abbreviated 
NVOC(n l troveratr y loxycarbon y l).AI«houghnotshownabove,b a ses(adenine,c V tosine 

9r ,° UpS Wh ' C h h mUSt be P rotec,ed duri "9 DNA synthesis. Thymine contains no exocyclic NH 2 and therefore requires 
no protect.on. The standard protecting groups for these anaines are shown below: 



0 O 



Adenine (A) Cytosine (C) Guanine (G) 

[01 79] Other amides of the general formula 



0 



R = AJLKYL, ARYL 



where R may be alkyl or aryl have been used. 

[0180] Another type of protecting group FMOC (9-fluorenyl methoxycarbonyl) is currently being used to protect the 
exocyclic amines of the three bases: 
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Adenine (A) 




Cytosine (C) 



Guanine (G) 



7^ ,uK anta K 9e ° f FM ° C 9r ° UP iS ,h3t U iS rem0Ved under mi,d con ditions (dilute organic bases) and can 

1 z jy ases - The amide protec,in9 sroups require h - sh »**°» to be l:^;1oh 

[0182] Nudeosides used as S'-OH probes, useful in verifying correct VLSIPS synthetic function, have been the fcl- 



a o 



fx A 

yy yy 




III 
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eitSn lrtlT P0 T S US6d 10 de,eC ' Wh6re ° n 8 subs,rate P hoto, y sis ha * occurred by the attachment of 
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VI 




[0184] The method of attachment of the first nucleoside to the surface of the substrate depends on the functionality 
of the groups at the substrate surface. If the surface is amine functional, an amide bon'd is made Sample" 



^ s s y / ^ 





[0185] If the surface is hydroxy functionalized a phosphate bond is made (see example below) 
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KVO 
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NVO 



O-P-O 



X 6— 
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25 



30 



[01 86] In both cases, the thymidine example is illustrated, but any one of the four phosphoramidite activated nucle- 
osides can be used in the first step. 

[0187] Photolysis of the photolabile group NV or NVOC on the 5' positions of the nucleosides is carried out at -362 
nm with an intensity of 14 mW/cm 2 for 10 minutes with the substrate side (side containing the photolabile group) 
immersed in dioxane. After the coupling of the next nucleoside is complete, the photolysis is repeated followed by 
another coupling until the desired oligomer is obtained. 

[01 88] One of the most common 3*-0-protecting group is the ester, in particular the acetate. 



R - ch 3 . c 8 h 5 




35 [0189] The groups can be removed by mild base treatment 0.1N NaOH/MeOH or K 2 C0 3 /H 2 0/MeOH. 
[0190] Another group used most often is the silyl ether. 



40 



45 




R v R 2 . R 3 - CH 3 



R r R 2 = CH 3 ; R 3 = tBi: 



R^» Rg» R 3 iPr 



[0191] These groups can be removed by neutral conditions using 1 M tetra-n-butylammonium fluoride in THF or 
50 under acid conditions. 

[0192] Related to photodeprotection, the nitroveratryl group could also be used to protect the 3*-position. 
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OR 




OCFLj 



0 NOg 



[01 93] Here, light (photolysis) would be used to remove these protecting groups 
[01 94] A variety of ethers can also be used in the protection of the 3'-0-position. 




R = TRITYL. BENZYL 



[0195] Removal of these groups usually involves acid or catalytic methods 

[0196] Although the specificity of interactions at particular locations will usually be homogeneous due to a homoge- 
neous polymer be.ng synthesized at each defined location, for certain purposes, it may be useful to have mixed poly- 
mers w.th a commensurate mixed collection of interactions occurring at specific defined locations, or degeneracy re- 
ducing analogues which have been discussed above and show broad specificity in binding. Then, a positive interaction 
signal may result from any of a number of sequences contained therein 

n°J,! 71 „ A h !T me,h ° d ° f 9enera,in9 3 ma,Hx PaHem ° n a Subs,rate ' ° reformed Po'yners may be individ- 

ually attached at particular s.tes on the substrate. This may be performed by individually attaching reagents one at a 
tone to specific positions on the matrix, a process which may be automated. Another way of generating a positional* 
defined matrix pattern on a substrate is to have individually specific reagents which interact with each specific position 
on the substrate For example, oligonucleotides may be synthesized at defined locations on the substrate Then the 
posmon ^ US SUrfa ° e 3 P,Ura ' i,y ° f re9i ° nS haVin9 hom °9 eneous oligonucleotides attached at each 

[0198] In particular, at least four different substrate preparation procedures are available for treating a substrate 
surface. They are the standard VLSIPS method, polymeric substrates. Durapore™. and synthetic beads or fibers The 
reatmen, labeled "standard VLSIPS" method involves applying aminopropyltriethoxysilane to a g.ass surface 

Zll J? P n " SUb f a,e approach invo,ves ei,her ° f «««> ways of generating a polymeric substrate. The first 
uses a h,gh concentration of aminopropyltriethoxysilane (2-20%) in an aqueous ethanol solution (95%). This allows 
the s.lane compound to polymerize both in solution and on the substrate surface, which provides a high density of 
amines on the surface of the glass. This density is contrasted with the standard VLSIPS method. This polymeric method 
allows for the depos.t.on on the substrate surface of a monolayer due to the anhydrous method used with the afore- 
mentioned silane. 

[0200] The second polymeric method involves either the coating or covalent binding of an appropriate acrylic acid 
polymer onto the substrate surface. In particular, e.g., in DNA synthesis, a monomer such as a hydroxypropylacrylate 
is used to generate a h.gh density of hydroxyl groups on the substrate surface, allowing for the formation of phosphate 
bonds. An example of such a compound is shown: 




Si(OCH 2 CH 3 ) 3 
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[0201] The method using a Durapore™ membrane (Millipore) consists of a polyvinylidine difluoride coatinq with 
crosshnked potyhydroxylpropyl acrylate [PVDF-HPA]: 9 w,th 



OH 



10 



Here the building up of. e.g., a DNA oligomer, can be started immediately since phosphate bonds to the surface can 
» be accomphshed ,n the first step with no need for modification. A nucleotide dimer (5' C-T-3") has been successfS 
made on this substrate in our labs. ' successfully 

[0202] The fourth method utilizes synthetic beads or fibers. This would use another substrate such as a teflon co 
polymer graft bead or fiber, which is covalently coated with an organic layer (hydrophi.ic) teJ^^ST^ 
20 £T I I" 6 fr ° m M ° ,eCU,ar Bros y stems ' '"<> ^is would offer the same advantage as heCapore™ 

™.i A ma,riX , paUer " 0f new rea 9 ents ma y °* targeted to each specific oligonucleotide position by attaching a 
complementary ohgonudeoWe to which the substrate bound form is complementary. For instance a number of eq^ns 
may have : homogeneous oligonucleotides synthesized at various locations^ 

to each o these can be individually generated and linked to a particular specific reagents. Ofte TeTespeT^eZ 
w.,1 be antibodies. As each of these is specific for finding its complementary oligonucleotide, ea^ S^Sfc^ 
gents w.ll bind through the oligonucleotide to the appropriate matrix position. A sing.e step having a comb! Son of 
d fferen. specific reagents being attached specifically to a particular oligonucleotide wil! thereby bind to iScomp emen 
a. the defined matrix potior, The oligonucleotides will typically then be cova.ent.y attached, using e g an acX 
f ; Z^lZ° SSUnkm9 - PS ° ra,en iS 3 commo "'y «»d Uridine dye for photocrosslinking purposes see e o Sona 
et al. (1979) Photochem. PhotobioL 29:1177-1197; Cimino et al. (1985) Ann Rev Biochem T1T51 11Qr p™« 

1 VJll , , a " 0WS 3 Sln9 ' e attechmen » manipulation to attach all of the specific reagents to the matrix 
at defined pos,.,ons and results in the specific reagents being homogeneously located at defined positions 
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D. Surface Immobilization 
1. caged biotin 



[0204] An alternate method of attaching reagents in a positional^ defined matrix pattern is to use a caqed biotin 

bSn m A "n , ' e H C39 ^ bi ° tin 3 P holosensi,ive b '^g moiety which prevents the ca£%£^S%£ 
b,o..n. At potions where the photo-lithographic process has removed the blocking group, high affinity bioto she a e 
generated. Thus, by a sequential series of photolithographic deblocking steps fSLpSJ^S,^ o those 
regions to appropriate biotin containing reagents. on.y those locations where the deblocking takes place wH.for^ an 
avidm-b.otin interaction. Because the avidin-bio.in binding is very tight, this wil, usually be virtua^yteversible bZng 

2. crosslinked interactions 

[0205] The surface immobilization may also take place by photocrosslinking of defined oligonucleotides linked to 

ra^oenfh 1 ^1°" ° f ^ oligonucleotides, the o^onud^^SSSi 

S^^^U^TT? TZoT, ° f dye - ° ,her US6fU ' Cr ° SS,inki "9 re » ™ scribed n 

uanagupta et al. (1985) U.S. Pat. No. 4,542,102, and (1987) U.S Pat No 4 713 326 

[0206] In another embodiment, colony or phage plaque transfer of biological polymers may be transferred directly 
onto a s.l.con substrate. For example, a colony plate may be transferred onto a substrate having a generic Xonucte 
ot,de sequence which hybridizes toano.her generic complementary sequence contained onallrtlne^^vSS 
ZTZV, T t ThlS , Wi,! SPeCifiCa " y ° n,y bind ,h ° Se m ° leCules which « a ^ual.y contained in the ve^Sat£ 
™£ Tr TTT Z S6qUenCe - ThiS immobi,i2a ""°" al| ows for producing a matrix onto which a sequence spTciSc 
reagent can b.nd. or for other purposes. .„ a further embodiment, a piura.ity of different vectors each having a Jec f c 
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oligonucleotide attached to the vector may be specifically attached to particular regions on a matrix having a comple- 
mentary oligonucleotide attached thereto. w 

VIII. HYBRIDIZATION/SPECIFIC INTERACTION 
A. General 

[0207] Asdiscussedprevio^^^ 

for specific interactions with sequence specific targets or probes 

. . Jrw 301 ^ 1 " 0 ^: I*? 6 availability of substrates having the eotfre repertoire of posstole sequences of a defined leogtH 
opens up the possibility of sequencing by hybridization. This sequence may be de novo determination of an unknown 
rZZT-' Part ' CUlarly ,° f nUC,eic acid ' verffloation 0 f a sequence determined by another method, or an investigation of 
changes ,n a previously sequenced gene, locating and identifying specific changes. For example, often Maxam and 
Gilbert sequencing techniques are applied to sequences which have been determined by Sanger and Coulson Each 
of those sequencing technologies have problems with resolving particular types of sequences. Sequencing by hybrid- 
ization may serve as a third and independent method for verifying other sequencing techniques. See e g (1988) 
bcience 242:1245. 1 7 

1° h 0 ! 1 k ?• ad , diti0r1, abiNty t0 Pr ° Vide 3 l3rge repert0ire 0f particular sequences allows use of short subsequence 
and hybridization as a means to fingerprint a polynucleotide sample. For example, fingerprinting to a high degree of 
specificity of sequence matching may be used for identifying highly similar samples, e.g., those exhibiting high homol- 

2Lm 2? Pr °, PS T Pr0V,de 3 meanS f ° r de,ermini "9 classifications of particu.ar sequences. This 

should allow determ.nat.on of whether particular genomes of bacteria, phage, or even higher cells might be related to 

P 2 ,i Q«o! n M a f iti0n ™7^ ,inQ ^ US6d ,0 id6n,ify an individU3 ' S0urce 0f biol °9 ical sam P'e- See, e.g.. Lander, 
whl" 339:5 ° 1 " 505 - and referenCeS ,Herein - F ° r eXam P' e ' a DN * be used to determine 

whether a genet.c sample arose from another individual. This would be particularly useful in various sorts of forensic 
tests to determme, e.g., paternity or sources of blood samples. Significant detail on the particulars of genetic finger- 
pr.nt.ng for identification purposes are described in. e.g., Morris et al. (1989) "Biostatistical evolution of evidence from 

T+lsTwT aZTTZ d 'T?^ZT Pr ° beS ''" referenCe t0 diSpU,6d Pa,erni,y of identi <*" J ?™™ Science 
34.1311-1317, and Neufeld etal. (1990) Scientific American 262:46-53 ~ 

[0211] in another embodiment, a fingerprinting-like procedure may be used for classifying cell types by analyzing a 
pattern of specific nucleic acids present in the cell, specifically RNA expression patterns. This may also be useful in 
defining the temporal stage of development of cells, e.g., stem cells or other cells which undergo temporal changes in 
development For example, the stage of a cell, or group of cells, may be tested or defined by isolating a sample of 
mRNA from the population and testing to see what sequences are present in messenger populations. Direct samples 
or amplified samples (e.g.. by polymerase chain reaction), may be used. Where particular mRNA or other nucleic acid 
sequences may be characteristic of or shown to be characteristic of particular developmental stages, physiological 
states, or other conditions, this fingerprinting method may define them 

[0212] The present invention may also be used for mapping sequences within a larger segment. This may be per- 
formed by at least two methods, particularly in reference to nucleic acids. Often, enormous segments of DNA are 
subdoned into a large plurality of subsequences. Ordering these subsequences may be important in determining the 
overlaps of sequences upon nucleotide determinations. Mapping may be performed by immobilizing particularly large 
segments onto a matrix using the VLSIPS technology. Alternatively, sequences may be ordered by virtue of subL 

Tm8 7 !cA^^Z n9 'hT 6 " 15 - **" Crai9 * aL (19! * ) NUC - AddS ReS - 18:2653-2660; Michiels et 
al. (1987) CABIOS 3:203-210; and Olson et al. (1986) Proc. Natl. Acad. Sci. USA 83:7826-7830. 

B. Important Parameters 

[0213] The extent of specific interaction between reagents immobilized to the VLSIPS substrate and another se- 
quence specific reagent may be modified by the conditions of the in.eracfion. Sequencing embodiments typically require 
high fidelity hybnd.zat.on and the ability to discriminate perfect matching from imperfect matching. Fingerprinting and 
mapping embodiments may be performed using less stringent conditions, or in some embodiments very highly stringent 
conditions, depending upon the circumstances. 9 
[0214] In a nucleic acid hybridization embodiment, the specificity and kinetics of hybridization have been described 
ZTSn Vu?" ^ etm " r and DavidSOn < 1968 > J - Mo '- Bi °', 31:349-370, Britten and Kohne (1968) Science 16V 
529-530, and Kaneh.sa, (1 984) Nuc. Acids Res. 1 2:203-21 3. Parameters which are well known to affect sp^ificiy and 
kmet.cs of reaction include salt conditions, ionic composition of the solvent, hybridization temperature, length of oligo- 
nucleotide matching sequences, guanine and cytosine (GC) content, presence of hybridization accelerators P H spe- 
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cific bases found in the matching sequences, solvent conditions, and addition of organic solvents. 
[0215] In particular, the salt conditions required for driving highly mismatched sequences to completion typically 
.nclude a high salt concentration. The typical salt used is sodium chloride (NaCI), however, other ionic salts may be 
utilized, e.g.. KCI. Depending on the desired stringency hybridization, the salt concentration will often be less than 
about 3 molar, more often less than 2.5 molar, usually less than about 2 molar, and more usually less than about 1 5 
molar. For applications directed towards higher stringency matching, the salt concentrations would typically be lower 
Ordinary high stringency conditions will utilize salt concentration of less than about 1 molar, more often less then about 
750 milhmolar, usually less than about 500 millimolar, and may be as low as about 250 or 150 millimolar 
[0216] The kinetics of hybridization and the stringency of hybridization both depend upon the temperature at which 
the hybridization is performed and the temperature at which the washing steps are performed. Temperatures at which 
steps for low stnngency hybridization are desired would typically be lower temperatures, e.g., ordinarily at least about 
1 5°C, more ordinarily at least about 20'C. usually at least about 25»C. and more usually at least about 30°C For those 
applications requiring high stringency hybridization, or fidelity of hybridization and sequence matching temperatures 
at which hybridization and washing steps are performed would typically be high. For example, temperatures in excess 
of about 35°C would often be used, more often in excess of about 40°C, usually at least about 45"C and occasionally 
even temperatures as high as about 50°C or 60°C or more. Of course, the hybridization of oligonucleotides may be 
disrupted by even higher temperatures. Thus, for stripping of targets from substrates, as discussed below, temperatures 
as high as 80°C, or even higher may be used. 

[0217] The base composition of the specific oligonucleotides involved in hybridization affects the temperature of 
melting, and the stability of hybridization as discussed in the above references. However, the bias of GC rich sequences 
to hybridize faster and retain stability at higher temperatures can be compensated for by the inclusion in the hybridization 
incubation or wash steps of various buffers. Sample buffers which accomplish this result include the triethly-and trime- 
thyl ammonium buffers. See, e.g., Wood et al. (1987) Proc. Natl. Acad. Sci. USA. 82:1585-1588 and Khrapko K et 
al. (1989) FEBS Letters 256:118-122. ' ' ' 

[0218] The rate of hybridization can also be affected by the inclusion of particular hybridization accelerators These 
hybr.d.zation accelerators include the volume exclusion agents characterized by dextran sulfate, or polyethylene glycol- 
(PEG). Dextran sulfate is typically included at a concentration of between 1% and 40% by weight The actual concen- 
tration selected depends upon the application, but typically a faster hybridization is desired in which the concentration 
is opt.rn.zed for the system in question. Dextran sulfate is often included at a concentration of between 0 5% and 2% 
by we.ght or dextran sulfate at a concentration between about 0.5% and 5%. Alternatively, proteins which accelerate 
hybridization may be added, e.g.. the recA protein found in E. coli) or other homologous proteins. 
[0219] Of course, the specific hybridization conditions will be selected to correspond to a discriminatory condition 
which provides a positive signal where desired but fails to show a positive signal at affinities where interaction is not 
desired. This may be determined by a number of titration steps or with a number of controls which will be run during 
the hybridization and/or washing steps to determine at what point the hybridization conditions have reached the stage 
of desired specificity. 

IX. DETECTION METHODS 

[0220] Methods for detection depend upon the label selected. The criteria for selecting an appropriate label are 
discussed below, however, a fluorescent label is preferred because of its extreme sensitivity and simplicity Standard 
labeling procedures are used to determine the positions where interactions between a sequence and a reagent take 
place. For example, if a target sequence is labeled and exposed to a matrix of different probes, only those locations 
where probes do interact with the target will exhibit any signal. Alternatively, other methods may be used to scan the 
matrix to determine where interaction takes place. Of course, the spectrum of interactions may be determined in a 
temporal manner by repeated scans of interactions which occur at each of a multiplicity of conditions. However instead 
of testing each individual interaction separately, a multiplicity of sequence interactions may be simultaneously deter- 
mined on a matrix. 

A. Labeling Techniques 

[0221] The target polynucleotide may be labeled by any of a number of convenient detectable markers. A fluorescent 
label is preferred because it provides a very strong signal with low background. It is also optically detectable at high 
resolut.on and sensitivity through a quick scanning procedure. Other potential labeling moieties include, radioisotopes 
chem.lum.nescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, magnetic labels' 
and linked enzymes. 

[0222] Another method for labeling does not require incorporation of a labeling moiety. The target may be exposed 
to the probes, and a double strand hybrid is formed at those positions only. Addition of a double strand specific reagent 
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will delect where hybridization takes place. An intercalate dye such as ethidium bromide may be used as long as the 
probes themselves do not fold back on themselves to a significant extent forming hairpin loops. See, e.g., Sheldon et 
al. (1986) U.S. Pat. No. 4,582,789. However, the length of the hairpin loops in short oligonucleotide probes would 
typically be insufficient to form a stable duplex. 

[0223] In another embodiment, different targets may be simultaneously sequenced where each target has a different 
label. For instance, one target could have a green fluorescent label and a second target could have a red fluorescent 
label. The scanning step will distinguish sites of binding of the red label from those binding the green fluorescent label. 
Each sequence can be analyzed independently from one another. 

[0224] Suitable chromogens will include molecules and compounds which absorb light in a distinctive range of wave- 
lengths so that a color may be observed, or emit light when irradiated with radiation of a particular wave length or wave 
length range, e.g., fluorescers. 

[0225] A wide variety of suitable dyes are available, being primary chosen to provide an intense color with minimal 
absorption by their surroundings. Illustrative dye types include quinoline dyes, triarylmethane dyes, acridine dyes 
alizarine dyes, phthaleins. insect dyes, azo dyes, anthraquinoid dyes, cyanine dyes, phenazathionium dyes, and phen- 
azoxonium dyes. 

[0226] A wide variety of fluorescers may be employed either by themselves or in conjunction with quencher mole- 
cules. Fluorescers of interest fall into a variety of categories having certain primary functionalities. These primary 
functionalities include 1- and 2-aminonaphthalene, p.p'-diaminostilbenes. pyrenes, quaternary phenanthridine salts 
9-am.noacnd.nes, p.p'-diaminobenzopnenone imines, anthracenes, oxacarbocyanine. merocyanine, 3-aminoequilen^ 
in, perylene, bis-benzoxazole, bis-p-oxazolyl benzene. 1,2-benzophenazin, retinol, bis-3-aminopyridinium salts helle- 
brigenin, tetracycline, sterophenol, benzimidzaolylphenylamine, 2-oxo-3-chromen, indole, xanthen, 7-hydroxycou- 
mann, phenoxazine, salicylate, strophanthidin, porphyrins, triarylmethanes and flavin. Individual fluorescent com- 
pounds which have functionalities for linking or which can be modified to incorporate such functionalities include e g 
dansyl chloride; fluoresceins such as 3,6-dihydroxy-9-phenylxanthhydrol; rhodamineisothiocyanate; N-phenyl 1-ami- 
no-8-sulfonatonaphthalene; N-phenyl 2-amino-6-sulfonatonaphthalene; 4-acetamido-4-isothiocyanato-stilbene-2 2*- 
disulfonic acid; pyrene-3-sulfonic acid; 2-toluidinonaphthalene-6-sulfonate; N-phenyl. N-methyl 2-aminoaphthalene- 
6-sulfonate; ethidium bromide; stebrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine- N N'- 
dioctadecyl oxacarbocyanine; N.^-dihexyl oxacarbocyanine; merocyanine 4-(3'pyrenyl)butyrate; d-3-aminodeso'xy- 
equilenin; 12-(9'-anthroyl)stearate; 2-methylanthracene; 9-vinylanthracene; 2,2'-(vinylene-p-phenylene)bisbenzoxa- 
zole; p-bis[2-(4-methyl-5-phenyl-oxazolyl)]benzene; 6-dimethylamino-1,2-benzophenazin; retinol; bis(3-aminopyridin- 
lum) 1,10-decandiyl diiodide; sulfonaphthylhydrazone of hellibrienin; chlorotetracycline; N-(7-dimethylamino-4-methyl- 

2-oxo-3-chromenyl)maleimide;N-[p-(2-benzimidazolyl)-phenyl]maleimide;N-(4-fluoranthyl)maleimide;bis(homovanil- 
he acid); resazarin; 4-chloro-7-nitro-2.1,3-benzooxadiazole; merocyanine 540; resorufin; rose bengal; and 2 4-diphe- 
nyl-3(2H)-furanone. . - 

[0227] Desirably, fluorescers should absorb light above about 300 nm, preferably about 350 nm, and more preferably 
above about 400 nm, usually emitting at wavelengths greater than about 10 nm higher than the wavelength of the light 
absorbed. It should be noted that the absorption and emission characteristics of the bound dye may differ from the 
unbound dye. Therefore, when referring to the various wavelength ranges and characteristics of the dyes, it is intended 
to indicate the dyes as employed and not the dye which is unconjugated and characterized in an arbitrary solvent 
[0228] Fluorescers are generally preferred because by irradiating a fluorescer with light, one can obtain a plurality 
of emissions. Thus, a single label can provide for a plurality of measurable events. 

[0229] Detectable signal may also be provided by chemiluminescent and bioluminescent sources. Chemiluminescent 
sources include a compound which becomes electronically excited by a chemical reaction and may then emit light 
which serves as the detectible signal or donates energy to a fluorescent acceptor. A diverse number of families of 
compounds have been found to provide chemiluminescence under a variety of conditions. One family of compounds 
is 2,3-dihydro-1.-4-phthalazinedione. The most popular compound is luminol, which is the 5-amino compound Other 
members of the family include the 5-amino-6.7.8-trimethoxy- and the dimethylamino[ca]benz analog. These com- 
pounds can be made to luminesce with alkaline hydrogen peroxide or calcium hypochlorite and base. Another family 
of compounds is the 2,4,5-triphenylimidazoles, with lophine as the common name for the parent product. Chemilumi- 
nescent analogs include para-dimethylamino and -methoxy substituents. Chemiluminescence may also be obtained 
with oxalates, usually oxalyl active esters, e.g.. p-nitrophenyl and a peroxide, e.g.. hydrogen peroxide, under basic 
conditions. Alternatively, luciferins may be used in conjunction with luciferase or lucigenins to provide bioluminescence. 
[0230] Spin labels are provided by reporter molecules with an unpaired electron spin which can be detected by 
electron spin resonance (ESR) spectroscopy. Exemplary spin labels include organic free radicals, transitional metal 
complexes, particularly vanadium, copper, iron, and manganese, and the like. Exemplary spin labels include nitroxide 
free radicals. 
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- B. Scanning System 

[0231] With the automated detection apparatus, the correlation of specific positional labeling is converted to the 
presence on the target of sequences for which the reagents have specificity of interaction. Thus, the positional infor- 
mation is directly converted to a database indicating what sequence interactions have occurred For example in a 
nucleic acid hybridization application, the sequences which have interacted between the substrate matrix and the target 
molecule can be directly listed from the positional information. The detection system used is described in PCT publi- 
cation no. WO90/15070. Although the detection described therein is a fluorescence detector, the detector may be 
replaced by a spectroscopic or other detector. The scanning system may make use of a moving detector relative to a 
fixed substrate, a fixed detector with a moving substrate, or a combination. Alternatively, mirrors or other apparatus 
can be used to transfer the signal directly to the detector. 

[0232] The detection method will typically also incorporate some signal processing to determine whether the signal 
at a particular matrix position is a true positive or may be a spurious signal. For example, a signal from a region which 
has actual positive signal may tend to spread over and provide a positive signal in an adjacent region which actually 
should not have one. This may occur, e.g., where the scanning system is not properly discriminating with sufficiently 
high resolution in its pixel density to separate the two regions. Thus, the signal over the spatial region may be evaluated 
pixel by p.xel to determine the locations and the actual extent of positive signal. A true positive signal should in theory 
show a uniform s.gnal at each pixel location. Thus, processing by plotting number of pixels with actual signal intensity 
should have a clearly uniform signal intensity. Regions where the signal intensities show a fairly wide dispersion may 
be particularly suspect and the scanning system may be programmed to more carefully scan those positions 
[0233] In another embodiment, as the sequence of a target is determined at a particular location, the overlap for the 
sequence would necessarily have a known sequence. Thus, the system can compare the possibilities for the next 
adjacent position and look at these in comparison with each other. Typically, only one of the possible adjacent sequenc- 
es should give a positive signal and the system might be programmed to compare each of these possibilities and select 
that one which g,ves a strong positive. In this way, the system can also simultaneously provide some means of meas- 
uring the reliability of the determination by indicating what the average signal to background ratio actually is 
[0234] More sophisticated signal processing techniques can be applied to the initial determination of whether a pos- 
itive signal exists or not. " 

[0235] From a listing of those sequences which interact, data analysis may be performed on a series of sequences 
For example, in a nucleic acid sequence application, each of the sequences may be analyzed for their overlap regions 
and the original target sequence may be reconstructed from the collection of specific subsequences obtained therein 
Other sorts of analyses for different applications may also be performed, and because the scanning system directly 
interfaces with a computer the information need not be transferred manually. This provides for the ability to handle 
large amounts of data with very little human intervention. This, of course, provides significant advantages over manual 
manipulations. Increased throughput and reproducibility is thereby provided by the automation of vast majority of steps 
in any of these applications. 

DATA ANALYSIS 

A. General 

[0236] Data analysis will typically involve aligning the proper sequences with their overlaps to determine the target 
sequence. Although the target "sequence" may not specifically correspond to any specific molecule especially where 
the target sequence is broken and fragmented up in the sequencing process, the sequence corresponds to a contiguous 
sequence of the subfragments. 

[0237] The data analysis can be performed by a computer using an appropriate program. See. e g Drmanac R et 
al. (1989) Genomics 4:114-128; and a commercially available analysis program available from the Genetic Engineering 
Center, P.O. Box 794, 11000 Belgrade. Yugoslavia. Although the specific manipulations necessary to reassemble the 
target sequence from fragments may take many forms, one embodiment uses a sorting program to sort all of the 
subsequences using a defined hierarchy. The hierarchy need not necessarily correspond to any physical hierarchy 
but provides a means to determine, in order, which subfragments have actually been found in the target sequence In 
this manner, overlaps can be checked and found directly rather than having to search throughout the entire set after 
each selection process. For example, where the oligonucleotide probes are 10-mers. the first 9 positions can be sorted 
A particular subsequence can be selected as in the examples, to determine where the process starts As analogous 
to the theoretical example provided above, the sorting procedure provides the ability to immediately find the position 
of the subsequence which contains the first 9 positions and can compare whether there exists more than 1 subsequence 
during the first 9 positions. In fact, the computer can easily generate all of (he possible target sequences which contain 
given comb.nat.on of subsequences. Typically there will be only one, but in various situations, there will be more 
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ImvLfor » fi ^ f ° r 3 """^O P r °9 ram is P rovi ^d in Figure 1. In general terms, the program 
prov,des for automated scanning of the substrate to determine the positions of probe and target interaction simple 
processing of the intensity of the signal may be incorporated to filter out clearly spurious signals. tSSbdwSS 

matS^T S6qUenCe SpedfiCi,y ° f SpedfiC matrix P° s « i0 - *> generaTeTe eTof 

fTal ' "? f "^T 5 - mf0rma,i0n iS C ° rrela,ed W " h ° ther ^ se « uence "*«naL. e.g.. restriction 

fragment analysis. The sequences are then aligned using overlap data, thereby leading to possible corresponding 
target sequences wh,ch will, optimally, correspond to a single target sequence. corresponding 

B. Hardware 

Se b^l? T PU, ![ SyS,e ™ ^ ^ US6d 10 m " 3 SeqUendn9 Pr °9 ram - The P"*™ may be written to 
^ the de,eCt,ng and scanni "9 sle P s t0 9«»her and will typically be dedicated to a particular scanning appa- 
ratus However, the components and functional steps may be separated and the scanning system may provide an 

IZ TH ,aPe " T e ' eCtr0niC C ° nneC,i0n in, ° 3 S6parate COmputer which *°^y the^encing 

To ?BM ZSil T T! ™ ^ ° f 3 nUmb6r ° f m3ChineS Pr ° vided by s,andard """P*" manufacturers 
e g., IBM compatible machines, Apple™ machines, VAX machines, and others, which may often use a UNIX™ oper- 

l2TnlT™T elV - ^Tl C ° mPU,ing architec,ures ™V be em P'°y ed . ^se architectures may include neural 

^S^^^T dW3re and/ ° r S ° ftWare - ° f C ° UrSe - the hardware USed to ™ the ana, ys* P^ a m 
will typically determine what programming language would be used. 

C. Software 

^aZovT/nr h W0U H ^ rea K Hy d6Vel0Ped bV 3 PerS ° n ° f ° rdinary Ski " " ' he PWnmhg art. following the flow 
chart provided, or based upon the input provided and the desired result 

LThlLS an , eXemplary e mbodiment is a polynucleotide sequence system. However, the theoretical and 

mathematical manipulations necessary for data analysis of other linear molecules are conceptually similar. 

XI. SUBSTRATE REUSE 

[0242] Where a substrate is made with specific reagents that are relatively insensitive to the handling and processing 

ofTof Z, S r SlnQ,e T ° f ^ SUbStrate ^ ° ften be reUSed " The ™ ,ecuIes 
bP ,1.11h » ' k 1! , , reC ° 9nit,0n mo,ecu,es 0f course - « f Purred that the manipulations and conditions 
be seated as to be m, Id and to no. affect the substrate. For example, if a substrate is acid labile, a neutral P H would 
be preferred ,n all handlmg steps. Similar sensitivities would be carefully respected where recycling is desired. 

A. Removal of Label 

wrn^ni ? PiCa " y 3 reCyC ' ing - Pr6ViOUS,y a,,aChed SpedfiC interaCtion would be disru P' ed and removed. This 

Tn fir r exp , os,n9 ,! subs,ra,e to condi,ions under which ,ne interac »° n ™™™ ^ is 

disrupted. Alternatively, ,t may be exposed to conditions where the target is destroyed. For example where the probes 

the in ZlTlZT T " * ^ Mde ' 3 heati " 9 and *» -« wash will often be suffic^n, o s' p 
he interact, ons^Additional reagents maybe added such as detergents, and organic or inorganic solvents whichdisrupt 
the interaction between the specific reagents and target. e"u»wnicnaisrupi 

B. Storage and Preservation 

a A nH indiCa r ed ab0Ve .' ma,riX Wi " ,yPiCa " y be maintai " ed und er conditions where the matrix itself and the 
For Pxln.p nf "* Vari ° US SpedfiC P rese ™tives may be added which prevent degradation 

avnid ctT V f r 9 T 3C,d ° r ^ ' abi,e ' 3 " eU,ral PH buffer Wi " ^ ical, y be added « »» also desired to 
Znn ° T ^ 9r ° Wth ° f ° r9aniSmS Which may deS,roy 0f 9 anic rea 9 en,s a « a ched thereto. For this 

eteTp'd .oT^ 3 Z "f 33 T** * *** m3y 8ddad - ,he Chemical Preservative should also be 

m^! P E£ " a,Ure ° f ^ Nnka9eS ° ther C ° mp0nen,S °' the subslrale " T VP- a »y. a detergent 

C. Processes to Avoid Degradation of Oligomers 

t oml,Jin Par,iCU,a r; 3 subs,r ( a,e uprising a lar ge number of oligomers will be treated in a fashion which is known 
to ma.n.am the quality and integnty of oligonucleotides. Thes include storing the substrate in a carefully controlled 
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environment under conditions of lower temperature, cation depletion (EDTA and EGTA), sterile conditions and inert 
argon or nitrogen atmosphere. 

XII. INTEGRATED SEQUENCING STRATEGY 

A. Initial Mapping Strategy 

[0246] As indicated above, although the VLSIPS may be applied to sequencing embodiments, it is often useful to 
integrate other concepts to simply the sequencing. For example, nucleic acids may be easily sequenced by careful 
selection of the vectors and hosts used for amplifying and generating the specific target sequences. For example it 
may be desrred to use specific vectors which have been designed to interact most efficiently with the VLSIPS substrate 
This is also important in fingerprinting and mapping strategies. For example, vectors may be carefully selected having 
particular complementary sequences which are designed to attach to a genetic or specific oligomer on the substrate 
This is also applicable to situations where it is desired to target particular sequences to specific locations on the matrix 
5 o2o \° ne embodiment ' unnatu ra> oligomers may be used to target natural probes to specific locations on the 
VLSIPS substrate. In addition, particular probes may be generated for the mapping embodiment which are designed 
to have specific combinations of characteristics. For example, the construction of a mapping substrate may depend 
upon use of another automated apparatus which takes clones isolated from a chromosome walk and attaches them 
individually or in bulk to the VLSIPS substrate. 

[0248] In another embodiment, a variety of specific vectors having known and particular "targeting" sequences ad- 
jacent the cloning sites may be individually used to clone a selected probe, and the isolated probe will then be targetable 
to a site on the VLSIPS substrate with a sequence complementary to the "target" sequence. 

B. Selection of Smaller Clones 



[0249] In the fingerprinting and mapping embodiments, the selection of probes may be very important Significant 
mathematical analysis may be applied to determine which specific sequences should be used as those probes Of 
course, for fingerpr.nt.ng use, sequences that show significant heterogeneity across the human population would be 
preferred. Select.on of the specific sequences which would most favorably be utilized will tend to be single copy se- 
quences within the genome, and more specifically single copy sequences that have low cross-hybridization potential 
to other sequences in the genome (i.e.. not members of a closely-related multigene family) 

[0250] Various hybridization selection procedures may be applied to select sequences which tend not to be repeated 
within a genome, and thus would tend to be conserved across individuals. For example, hybridization selections may 
be made for non-repetitive and single copy sequences. See. e.g.. Britten and Kohne (1968) "Repeated Sequences in 
^lence 161:529-540. On the other hand, it may be desired under certain circumstances to use repeated se- 
quences. For example, where a fingerprint may be used to identify or distinguish different species, or where repetitive 
sequences may be d.agnostic of specific species, repetitive sequences may be desired for inclusion in the fingerprinting 
probes. In either case, the sequencing capability will greatly assist in the selection of appropriate sequences to be 
used as probes. 

[0251] Also as indicated above, various means for constructing an appropriate substrate may involve either mechan- 
ical or automated procedures. The standard VLSIPS automated procedure involves synthesizing oligonucleotides or 
short polymers d.rectly on the substrate. In various other embodiments, it is possible to attach separately synthesized 
reagents onto the matrix in an ordered array. Other circumstances may lend themselves to transfer a pattern from a 
petn plate onto a solid substrate. Also, there are methods for site specifically directing collections of reagents to specific 
locations using unnatural nucleotides or equivalent sorts of targeting molecules. 

[0252] While a brute force manual transfer process may be utilized sequentially attaching various samples to suc- 
cessive posrtions. instrumentation for automating such procedures may also be devised. The automated system for 
performing such would preferably be relatively easily designed and conceptually easily understood. 

XIII. COMMERCIAL APPLICATIONS 

A. Sequencing 

[0253] As indicated above, sequencing may be performed either de novo or as a verification of another sequencing 
method. The present hybridization technology provides the ability to sequence nucleic acids and polynucleotides de 
novo, or as a means to verify either the Maxam and Gilbert chemical sequencing technique or Sanger and Coulson 
d.deoxy- sequencing techniques. The hybridization method is useful to verify sequencing determined by any other 
sequencing technique and to closely compare two similar sequences, e.g:. to identify and locate sequence differences 
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[0254] Of course, sequencing of can be very important in many different sorts of environments. For example, it will 
be useful in determining the genetic sequence of particular markers in various individuals. In addition, polymers may 
be used as markers or for information containing molecules to encode information. For example, a short polynucleotide 
sequence may be included in large bulk production samples indicating the manufacturer, date, and location of manu- 
facture of a product. For example, various drugs may be encoded with this information with a small number of molecules 
in a batch. For example, a pill may have somewhere from 10 to 100 to 1,000 or more very short and small molecules 
encoding this information. When necessary, this information may be decoded from a sample of the material using a 
polymerase chain reaction (PCR) or other amplification method. This encoding system may be used to provide the 
origin of large bulky samples without significantly affecting the properties of those samples. For example chemical 
samples may also be encoded by this method thereby providing means for identifying the source and manufacturing 
details of lots. The origin of bulk hydrocarbon samples may be encoded. Production lots of organic compounds such 
as benzene or plastics may be encoded with a short molecule polymer. Food stuffs may also be encoded using similar 
marking molecules. Even toxic waste samples can be encoded determining the source or origin. In this way, proper 
disposal can be traced or more easily enforced. 

[0255] Similar sorts of encoding may be provided by fingerprinting-type analysis. Whether the resolution is absolute 
or less so, the concept of coding information on molecules such as nucleic acids, which can be amplified and later 
decoded, may be a very useful and important application. 

[0256] This technology also provides the ability to include markers for origins of biological materials. For example, 
a patented animal line may be transformed with a particular unnatural sequence which can be traced back to its origin 
With a selection of multiple markers, the likelihood could be negligible that a combination of markers would have 
independently arisen from a source other than the patented or specifically protected source. This technique may provide 
a means for tracing the actual origin of particular biological materials. Bacteria, plants, and animals will be subject to 
marking by such encoding sequences. 

B. Fingerprinting 

[0257] As indicated above, fingerprinting technology may also be used for data encryption. Moreover fingerprinting 
allows for significant identification of particular individuals. Where the fingerprinting technology is standardized and 
used for .denization of large numbers of people, related equipment and peripheral processing will be developed to 
accompany the underlying technology. For example, specific equipment may be developed for automatically taking a 
biological sample and generating or amplifying the information molecules within the sample to be used in fingerprinting 
analysis. Moreover, the fingerprinting substrate may be mass produced using particular types of automatic equipment 
Synthetic equipment may produce the entire matrix simultaneously by stepwise synthetic methods as provided by the 
VLSIPS technology. The attachment of specific probes onto a substrate may also be automated. 
[0258] In addition, peripheral processing may be important and may be dedicated to this specific application Thus 
automated equipment for producing the substrates may be designed, or particular systems which take in a biological 
sample and output either a computer readout or an encoded instrument, e.g.. a card or document which indicates the 
information and can provide that information to others. An identification having a short magnetic strip with a few million 
bits may be used to provide individual identification and important medical information useful in a medical emergency. 
[0259] In fact, data banks may be set up to correlate all of this information of fingerprinting with medical information. 
This may allow for the determination of correlations between various medical problems and specific DNA sequences 
By collating large populations of medical records with genetic information, genetic propensities and genetic suscepti- 
bilities to particular medical conditions may be developed. Moreover, with standardization of substrates the micro 
encoding data may be also standardized to reproduce the information from a centralized data bank or on an encoding 
device earned on an individual person. On the other hand, if the fingerprinting procedure is sufficiently quick and routine 
every hospital may routinely perform a fingerprinting operation and from that determine many important medical pa- 
rameters for an individual. 

[0260] In particular industries, the VLSIPS sequencing, fingerprinting, or mapping technology will be particularly 
appropriate. As mentioned above, agricultural livestock suppliers may be able to encode and determine whether their 
particular strains are being used by others. By incorporating particular markers into their genetic stocks the markers 
will indicate origin of genetic material. This is applicable to seed producers, livestock producers, and other suppliers 
of medical or agricultural biological materials. 

[0261] This may also be useful in identifying individual animals or plants. For example, these markers may be useful 
in determining whether certain fish return to their original breeding grounds, whether sea turtles always return to their 
original birthplaces, or to determine the migration patterns and viability of populations of particular endangered species 
It would also provide means for tracking the sources of particular animal products. For example, it might be useful for 
determ.n.ng the origins of controlled animal substances such as elephant ivory or particular bird populations whose 
importation or exportation is controlled. 
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[0262] As indicated above, polymers may be used to encode important information on source and batch and supplier. 
This is described in greater detail, e.g., "Applications of PCR to industrial problems," (1 990). in Chemical and Engineer- 
ing News 68:145. In fact, the synthetic method can be applied to the storage of enormous amounts of information. 
Small substrates may encode enormous amounts of information, and its recovery will make use of the inherent repli- 
cation capacity. For example, on regions of 10 \im x 10 u.m, 1 cm 2 has 10 6 regions. In theory, the entire human genome 
could be attached in 1000 nucleotide segments on a 3 cm 2 surface. Genomes of endangered species may be stored 
on these substrates. 

[0263] Fingerprinting may also be used for genetic tracing or for identifying individuals for forensic science purposes. 
See, e.g., Morris, J. et al. (1989) "Biostatistical Evaluation of Evidence From Continuous Allele Frequency Distribution 
DNA Probes in Reference to Disputed Paternity and Identity," J. Forensic Science 34:1311-1317. and references pro- 
vided therein. 

[0264] In addition, the high resolution fingerprinting allows the distinguishability to high resolution of particular sam- 
ples. As indicated above, new cell classifications may be defined based on combinations of a large number of properties. 
Similar applications will be found in distinguishing different species of animals or plants. In fact, microbial identification 
may become dependent or characterization of the genetic content. Tumors or other cells exhibiting abnormal physiology 
will be detectable by use of the present invention. Also, knowing the genetic fingerprint of a microorganism may provide 
very useful information on how to treat an infection by such organism. 

[0265] Modifications of the fingerprint embodiments may be used to diagnose the condition of the organism. For 
example, a blood sample is presently used for diagnosing any of a number of different physiological conditions. A multi- 
dimensional fingerprinting method made available by the present invention could become a routine means for diag- 
nosing an enormous number of physiological features simultaneously. This may revolutionize the practice of medicine 
in providing information on an enormous number of parameters together at one time. In another way, the genetic 
predisposition may also revolutionize the practice of medicine providing a physician with the ability to predict the like- 
lihood of particular medical conditions arising at any particular moment. It also provides the ability to apply preventative 
medicine. 

[0266] Also available are kits with the reagents useful for performing sequencing, fingerprinting, and mapping pro- 
cedures. The kits will have various compartments with the desired necessary reagents, e.g., substrate, labeling rea- 
gents for target samples, buffers, and other useful accompanying products. 

C. Mapping 

[0267] The present invention also provides the means for mapping sequences within enormous stretches of se- 
quence. For example, nucleotide sequences may be mapped within enormous chromosome size sequence maps. For 
example, it would be possible to map a chromosomal location within the chromosome which contains hundreds of 
millions of nucleotide base pairs. In addition, the mapping and fingerprinting embodiments allow for testing of chromo- 
somal translocations, one of the standard problems for which amniocentesis is performed. 

[0268] The present invention will be better understood by reference to the following illustrative examples. The fol- 
lowing examples are offered by way of illustration and not by way of limitation. 

[0269] Relevant techniques are described in PCT publication no. WO90/1 5070, published December 13,1 990; PCT 
publication no. WO91/07087, published May 30, 1991. 

[0270] Also, additional relevant techniques are described, e.g., in Sambrook, J., et al. (1989) Molecular Cloning: a 
Laboratory Manual , 2d Ed., vols 1-3, Cold Spring Harbor Press, New York; Greenstein and Winitz (1961) Chemistry 
of the Amino Acids, Wiley and Sons, New York; Bodzansky, M. (1988) Peptide Chemistry: a Practical Textbook , Spring- 
er- Verlag, New York; Harlow and Lane (1988) Antibodies: A Laboratory Manual , Cold Spring Harbor Press, New York; 
Glover, D. (ed.) (1987) DNA Cloning: A Practical Approach , vols 1-3, IRL Press, Oxford; Bishop and Rawlings (1987) ' 
Nucleic Acid and Protein Sequence Analysis: A Practical Approach , IRL Press, Oxford; Hames and Higgins (1985) 
Nucleic Acid Hybridisation: A Practical Approach , IRL Press, Oxford; Wu et al. (1 989) Recombinant DNA Methodology , 
Academic Press, San Diego; Goding (1986) Monoclonal Antibodies: Principles and Practice , (2d ed.), Academic Press, 
San Diego; Finegold and Barron (1986) Bailey and Scott's Diagnostic Microbiology , (7th ed.), Mosby Co., St. Louis; 
Collins et al. (1989) Microbiological Methods, (6th ed.), Butterworth, London; Chaplin and Kennedy (1986) Carbohy- 
drate Analysis: A Practical Approach , IRL Press, Oxford; Van Dyke (ed.) (1985) Bioluminescence and Chemilumines- 
cence: Instruments and Applications , vol 1, CRC Press, Boca Rotan; and Ausubel. etal. (ed.) (1990) Current Protocols 
in Molecular Biology , Greene Publishing and Wiley-Interscience, New York. 

EXAMPLES 



[0271] The following examples are provided to illustrate the efficacy of the inventions herein. All operations were 
conducted at about ambient temperatures and pressures unless indicated to the contrary. 
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POLYNUCLEOTIDE SEQUENCING 

1. HPLC of the photolysis of 5-O-nitroveratry (-thymidine. 

[0272] In order to determine the time for photolysis of 5'-o-nitrovertryi thymidine to thymidine a 100 solution of 
NV-Thym-OH (S'-O-nitrovertryl thymidine) in dioxane was made and -200 u.l aliquots were irradiated (in a quartz cuvette 
1 cm x 2 mm) at 362.3 nm for 20 sec, 40 sec, 60 sec, 2 min, 5 min, 10 min, 15 min, and 20 min. The resulting irradiated 
mixtures were then analyzed by HPLC using a Varian MicroPak SP column (C 18 analytical) at a flow rate of 1 ml/min 
and a solvent system of 40% CH 3 CN and 60% water. Thymidine has a retention time of 1.2 min and NVO-Thym-OH 
has a retention time of 2.1 min. It was seen that after 10 min of exposure the deprotection was complete. 

2. Preparation and Detection of Thymidine-Cytidine dimer (FITC) 
[0273] The reaction is illustrated: 





[0274] To an aminopropylated glass slide (standard VLSIPS) was added a mixture of the following: 

12.2 mg of NVO-Thym-C0 2 H (IX) 

3.4 mg of HOBT (N-hydroxybenztriazal) 
8.8 \i\ DIEA (Diisopropylethylamine) 
11.1 mg BOP reagent 

2.5 ml DMF 

[0275] After 2 h coupling time (standard VLSIPS) the plate was washed, acetylated with acetic anhydride/pyridine, 
washed, dried, and photolyzed in dioxane at 362 nm at 14 mW/cnv? for 10 min using a 500 ujti checkerboard mask' 
The slide was then taken and treated with a mixture of the following: 



107 mg of FMOC-amine modified C (III) 
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21 mg of tetrazole • . 
1 ml anhydrous CH 3 CN 

[0276] After being treated for approximately 8 min, the slide was washed off with CH 3 CN, dried, and oxidized with 
t2/H 2 0/THF/lut.dine for 1 min. The slide was again washed, dried, and treated for 30 min with a 20% solution of DBU 

A r!l!L ! h0rOU9h rinSin9 ° f the SlkJe ' " W3S next exposed to a F,TC solution ( 1mM fluorescein isothiocyanate 
[FITC] in DMF) for 50 min, then washed, dried, and examined by fluorescence microscopy. This reaction is illustrated- 





3. Preparation and Detection of Thymidine-Cytidine dimer (Biotin) 

[0277] An aminopropyl glass slide, was soaked in a solution of ethylene oxide (20% in DMF) to generate a hydrox- 
ylated surface. The slide was added a mixture of the following: 

32 mg of NVO-T-OCED (X) 

11 mg of tetrazole 

0.5 ml of anhydrous CH 3 CN 

[0278] After 8 min the plate was then rinsed with acetonitrile, then oxidized with I 2 /H 2 0/THF/Iutidine for 1 min washed 
and dried. The slide was then exposed to a 1:3 mixture of acetic anhydride:pyridine for 1 h, then washed and dried 
The substrate was a then photolyzed in dioxane at 362 nm at 14 roW/cm* for 10 min using a SOO^tm checkerboard 
mask, dried, and then treated with a mixture of the following: 

65 mg of biotin modified C (IV) 

11 mg of tetrazole 

0.5 ml anhydrous CH 3 CN 

[0279] After 8 min the slide was washed with CH 3 CN then oxidized with I 2 /H 2 0/THF/Iutidine for 1 min, washed, and 
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then dried. The slide was then-soaked for 30 min in a PBS/0.05% Tween 20 buffer and the solution then shaken off. 
The slide was next treated with FITC-labeled streptavidin at 10 fig/m! in the same buffer system for 30 min. After this 
time the streptavidin-buffer system was rinsed off with fresh PBS/0.05% Tween 20 buffer and then the slide was finally 
agitated in distilled water for about 1/2 h. After drying, the slide was examined by fluorescence microscopy (see Fig. 
2 and Fig. 3). 

4. substrate preparation 

[0280] Before attachment of reactive groups it is preferred to clean the substrate which is, in a preferred embodiment, 
a glass substrate such as a microscope slide or cover slip. A roughened surface will be useable but a plastic or other 
solid substrate is also appropriate. According to one embodiment the slide is soaked in an alkaline bath consisting of, 
e.g., 1 liter of 95% ethanol with 120 ml of water and 120 grams of sodium hydroxide for 1 2 hours. The slides are washed 
with a buffer and under running water, allowed to air dry, and rinsed with a solution of 95% ethanol. 
[0281] The slides are then aminated with, e.g., aminopropyltriethoxysilane for the purpose of attaching amino groups 
to the glass surface on linker molecules, although other omega functionalized silanes could also be used for this pur- 
pose. In one embodiment 0.1% aminopropyltriethoxysilane is utilized, although solutions with concentrations from 
10- 7 % to 10% may be used, with about 10' 3 % to 2% preferred. A 0.1% mixture is prepared by adding to 100 ml of a 
95% ethanol/5% water mixture, 100 microliters (uJ) of aminopropyltriethoxysilane. The mixture is agitated at about 
ambient temperature on a rotary shaker for an appropriate amount of time, e.g., about 5 minutes. 500 uJ of this mixture 
is then applied to the surface of one side of each cleaned slide. After 4 minutes or more, the slides are decanted of 
this solution and thoroughly rinsed three times or more by dipping in 100% ethanol. 

[0282] After the slides dry, they are heated in a 110-120°C vacuum oven for about 20 minutes, and then allowed to 
cure at room temperature for about 12 hours in an argon environment. The slides are then dipped into DMF (dimeth- 
ylformamide) solution, followed by a thorough washing with methylene chloride. 

5. linker attachment, blocking of free sites 

[0283] The aminated surface of the slide is then exposed to about 500 ^l of, for example, a 30 millimolar (mM) solution 
of NVOC-nucleotide- NHS (N-hydroxysuccinimide) in DMF for attachment of a NVOC-nucleotide to each of the amino 
groups. See, e.g., SIGMA Chemical Company for various nucleotide derivatives. The surface is washed with, for ex- 
ample, DMF, methylene chloride, and ethanol. 

[0284] Any unreacted aminopropyl silane on the surface, i.e., those amino groups which have not had the NVOC- 
nucleotide attached, are now capped with acetyl groups (to prevent further reaction) by exposure to a 1:3 mixture of 
acetic anhydride in pyridine for 1 hour. Other materials which may perform this residual capping function include trif- 
luoroacetic anhydride, formicacetic anhydride, or other reactive acylating agents. Finally, the slides are washed again 
with DMF, methylene chloride, and ethanol. 

6. synthesis of eight trimers of C and T 

[0285] Fig. 4 illustrates a possible synthesis of the eight trimers of the two-monomer set: cytosine and thymine (rep- 
resented by C and T, respectively). A glass slide bearing silane groups terminating in 6-nitroveratryloxycarboxamide . 
(NVOC-NH) residues is prepared as a substrate. Active esters (pentafluorophenyl, OBt, etc.) of cytosine and thymine 
protected at the 5' hydroxyl group with NVOC are prepared as reagents. While not pertinent to this example, if side 
chain protecting groups are required for the monomer set, these must not be photoreactive at the wavelength of light 
used to protect the primary chain. 

[0286] For a monomer set of size n, n x i cycles are required to synthesize all possible sequences of length 6. A 
cycle consists of: 

1. Irradiation through an appropriate mask to expose the 5'-OH groups at the sites where the next residue is to be 
added, with appropriate washes to remove the by-products of the deprotection. 

2. Addition of a single activated and protected (with the same photochemically-removable group) monomer, which 
will react only at the sites addressed in step 1 , with appropriate washes to remove the excess reagent from the 
surface. 

[0287] The above cycle is repeated for each member of the monomer set until each location on the surface has been 
extended by one residue in one embodiment. In other embodiments, several residues are sequentially added at one 
location before moving on to the next location. Cycle times will generally be limited by the coupling reaction rate, now 
as short as about 10 min in automated oligonucleotide synthesizers. This step is optionally followed by addition of a 
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protecting group to stabilize the array for later testing. For some types of polymers (e.g., peptides), a final deprotection 
of the entire surface (removal of photoprotective side chain groups) may be required. 

[0288] More particularly, as shown in Fig. 4A, the glass 20 is provided with regions 22, 24, 26, 28, 30, 32, 34, and 
36. Regions 30, 32, 34, and 36 are masked, indicated by the hatched regions, as shown in Fig. 4B and the glass is 
irradiated by the bright regions 22, 24, 26, and 28, and exposed to a reagent containing a photosensitive blocked C 
(e.g., cytosine derivative), with the resulting structure shown in Fig. 4C. The substrate is carefully washed and the 
reactants removed. Thereafter, regions 22, 24, 26, and 28 are masked, as indicated by the hatched region, the glass 
is irradiated (as shown in Fig. 4D), as indicated by the bright regions, at 30, 32, 34, and 36, and exposed to a photo- 
sensitive blocked reagent containing T (e.g., thymine derivative), with the resulting structure shown in Fig. 4E. The 
process proceeds, consecutively masking and exposing the sections as shown until the structure shown in Fig. 4M is 
obtained. The glass is irradiated and the terminal groups are, optionally, capped by acetylation. As shown, all possible 
trimers of cytosine/thymine are obtained. 

[0289] In this example, no side chain protective group removal is necessary, as might be common in modified nu- 
cleotides. If it is desired, side chain deprotection may be accomplished by treatment with ethanedithiol and trifluoro- 
acetic acid. 

[0290] In general, the number of steps needed to obtain a particular polymer chain is defined by: 

nx€ (1) 

where: 

n = the number of monomers in the basis set of monomers, and 
€ = the number of monomer units in a polymer chain. 

[0291] Conversely, the synthesized number of sequences of length £ will be: 



n . (2 ) 

[0292] Of course, greater diversity is obtained by using masking strategies which wilt also include the synthesis of 
polymers having a length of less than £ . If, in the extreme case, all polymers having a length less than or equal to £ 
are synthesized, the number of polymers synthesized will be: 

€ €-1 1 

n + n -+ ... + n . (3) 

[0293] The maximum number of lithographic steps needed will generally be n for each "layer" of monomers, i.e., the 
total number of masks (and, therefore, the number of lithographic steps) needed will be n x £. The size of the transparent 
mask regions will vary in accordance with the area of the substrate available for synthesis and the number of sequences 
to be formed. In general, the size of the synthesis areas will be: 

size of synthesis areas = (A)/(S) 

where: 

A is the total area available for synthesis; and 
S is the number of sequences desired in the area. 

[0294] It will be appreciated by those of skill in the art that the above method could readily be used to simultaneously 
produce thousands or millions of oligomers on a substrate using the photolithographic techniques disclosed herein. 
Consequently, the method results in the ability to practically test large numbers of, for example, di, tri, tetra, penta, 
hexa, hepta, octa, nona, deca, even dodecanucleotides, or larger polynucleotides. 

[0295] The above example has illustrated the method by way of a manual example. It will of course be appreciated 
that automated or semi-automated methods could be used. The substrate would be mounted in a flow cell for automated 
addition and removal of reagents, to minimize the volume of reagents needed, and to more carefully control reaction 
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conditions. Successive masks will be applicable manually or automatically. See, e.g., PCT publication no. WO90/15070. 
7. labeling of target 



[0296] The target oligonucleotide can be labeled using standard procedures referred to above. As discussed, for 
certain situations, a reagent which recognizes interaction, e.g., ethidium bromide, may be provided in the detection 
step. Alternatively, fluorescence labeling techniques may be applied, see, e.g., Smith et al (1986) Nature 321* 
674-679; and Prober, etal.0 
modifications as appropriate for the label selected. 

8. dimers of A, C, G, and T 



[0297] The described technique may be applied, with photosensitive blocked nucleotides corresponding to adenine, 
cytosine, guanine, and thymine, to make combinations of polynucleotides consisting of each of the four different nu- 
cleotides. All 16 possible dimers would be made using a minor modification of the described method. 

9. 10-mers of A, C, G, and T 

[0298] The described technique for making dimers of A, C, G. and T may be further extended to make longer oligo- 
nucleotides. The automated system described, e.g., in PCT publication no. WO90/15070 can be adapted to make all 
possible 10-mers composed of the 4 nucleotides A, C, G, and T. The photosensitive, blocked nucleotide analogues 
have been described above, and would be readily adaptable to longer oligonucleotides. 

10. specific recognition hybridization to 10-mers 



[0299] The described hybridization conditions are directly applicable to the sequence specific recognition reagents 
attached to the substrate, produced as described immediately above. The 10-mers have an inherent property of hy- 
bridizing to a complementary sequence. For optimum discrimination between full matching and some mismatch the 
conditions of hybridization should be carefully selected, as described above. Careful control of the conditions, and 
titration of parameters should be performed to determine the optimum collective conditions. 

11. hybridization 



[0300] Hybridization conditions are described in detail, e.g., in Hames and Higgins (1 985) Nucleic Acid Hybridisation: 
A Practical Approach; and the considerations for selecting particular conditions are described, e.g., in Wetmur and 
Davidson, (1988) J. Mol. Bio). 31:349-370, and Wood et al. (1985) Proc. Natl. Acad. Sci. USA 82:1585-1588. As de- 
scribed above, conditions are desired which can distinguish matching along the entire length of the probe from where 
there is one or more mismatched bases. The length of incubation and conditions will be similar, in many respects, to 
the hybridization conditions used in Southern blot transfers. Typically, the GC bias may be minimized by the introduction 
of appropriate concentrations of the alkylammonium buffers, as described above. 

[0301 ] Titration of the temperature and other parameters is desired to determine the optimum conditions for specificity 
and distinguishability of absolutely matched hybridization from mismatched hybridization. 

[0302] A fluorescently labeled target or set of targets are generated, as described in Prober et al (1987) Scienc e 
238:336-341 , or Smith, et al. (1986) Nature 321:674-679. Preferably, the target or targets are of the same le^ttTaT 
or slightly longer, than the oligonucleotide probes attached to the substrate and they will have known sequences. Thus 
only a few of the probes hybridize perfectly with the target, and which particular ones did would be known. 
[0303] The substrate and probes are incubated under appropriate conditions for a sufficient period of time to allow 
hybridization to completion. The time is measured to determine when the probe-target hybridizations have reached 
completion. A salt buffer which minimizes GC bias is preferred, incorporating, e.g., buffer, such as tetramethyl ammo- 
nium or tetraethyl ammonium ion at between about 2.4 and 3.0 M. See Wood, et al. (1985) Proc. Nat'lAcad. Sci. USA 
82:1585-1588. This time is typically at least about 30 min, and may be as long as about 1-5 days. Typically very long 
matches will hybridize more quickly, very short matches will hybridize less quickly, depending upon relative target and 
probe concentrations. The hybridization will be performed under conditions where the reagents are stable for that time 
duration. 

[0304] Upon maximal hybridization, the conditions for washing are titrated. Three parameters initially titrated are 
time, temperature, and cation concentration of the wash step. The matrix is scanned at various times to determine the 
conditions at which the distinguishability between true perfect hybrid and mismatched hybrid is optimized. These con- 
ditions will be preferred in the sequencing embodiments. 
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1 2; positional detection of specific interaction - 

[0305] As indicated above, the detection of specific interactions may be performed by detecting the positions where 
the labeled target sequences are attached. Where the label is a fluorescent label, the apparatus described, e.g., PCT 
publication no. WO90/15070 may be advantageously applied. In particular, the synthetic processes described above 
will result in a matrix pattern of specific sequences attached to the substrate, and a known pattern of interactions can 
be converted to corresponding sequences. 

[0306] In an alternative embodiment, a separate reagent which differentially interacts with the probe and interacted 
probe targets can indicate where interaction occurs or does not occur. A single-strand specific reagent will indicate 
where no interaction has taken place, while a double-strand specific reagent will indicate where interaction has taken 
place. An intercalating dye, e.g., ethidium bromide, may be used to indicate the positions of specific interaction. 

13. analysis 

[0307] Conversion of the positional data into sequence specificity will provide the set of subsequences whose anal- 
ysis by overlap segments, may be performed, as described above. Analysis is provided by the methodology described 
above, or using, e.g., software available from the Genetic Engineering Center, P.O. Box 794, 11000 Belgrade, Yugo- 
slavia (Yugoslav group). See, also, Macevicz, PCT publication no. WO 90/04652. 
[0308] Preparation of short peptides on a substrate is described below. 

POLYNUCLEOTIDE FINGERPRINTING 

[0309] The above section on generation of reagents for sequencing provides specific reagents useful for fingerprint- 
ing applications. Fingerprinting embodiments may be applied towards polynucleotide fingerprinting, cell and tissue 
classification, cell and tissue temporal development stage classification, diagnostic tests, forensic uses for individual 
identification, classification of organisms, and genetic screening of individuals. Mapping applications are also described 
below. 

[0310] Polynucleotide fingerprinting may use reagents similar to those described above for probing a sequence for 
the presence of specific subsequences found therein. Typically, the subsequences used for fingerprinting will be longer 
than the sequences used in oligonucleotide sequencing. In particular, specific long segments may be used to determine 
the similarity of different samples of nucleic acids. They may also be used to fingerprint whether specific combinations 
of information are provided therein. Particular probe sequences are selected and attached in a positional manner to a 
substrate. The means for attachment may be either using a caged biotin method described or by another method using 
targeting molecules. In one embodiment, an unnatural nucleotide or similar complementary binding molecule may be 
attached to the fingerprinting probe and the probe thereby directed towards complementary sequences on a VLSI PS 
substrate. Typically, unnatural nucleotides would be preferred, e.g., unnatural optical isomers, which would not interfere 
with natural nucleotide interactions. 

[0311] Having produced a substrate with particular fingerprint probes attached thereto at positionally defined regions, 
the substrate may be used in a manner quite similar to the sequencing embodiment to provide information as to whether 
the fingerprint probes are detecting the corresponding sequence in a target sequence. This will often provide information 
similar to a Southern blot hybridization. 

Temporal Development 

Developmental RNA expression patterns 

[0312] The present fingerprinting invention also allows cell classification by identification of developmental RNA ex- 
pression patterns. For example, a lymphocyte stem cell expresses a particular combination of RNA species. As the 
lymphocyte develops through a program developmental scheme, at various stages it expresses particular RNA species 
which are diagnostic of particular stages in development. Again, the fingerprinting methodology allows for the definition 
of specific structural features which are diagnostic of developmental or functional features which will allow classification 
of cells into temporal developmental classes. Cells, products of those cells, or lysates of those cells will be assayed 
to determine the developmental stage of the source cells. In this manner, once a developmental stage is defined, 
specific synchronized populations of cells will be selected out of another population. These synchronized populations 
may be very important in determining the biological mechanisms of development. 

[0313] The present invention also allows for fingerprinting of the mRNA population of a cell. In this fashion, the mRNA 
population, which should be a good determinant of developmental stage, will be correlated with other structural features 
of the cell. In this manner, cells at specific developmental stages will be characterized by the intracellular environment, 
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as well as the extracellular environment. 
Diagnostic Tests 

[0314] The present invention also provides the ability to perform diagnostic tests. Diagnostic tests typically are based 
upon a fingerprint type assay, which tests for the presence of specific diagnostic polynucleotides. Thus, the present 
invention provides means for viral strain identification, bacterial strain identification, and other diagnostic tests using 
positionally defined specific oligonucleotide reagents. 

Viral Identification 

[0315] The present invention provides reagents and methodology for identifying viral strains. The viral genome may 
be probed for specific sequences which are characteristic of particular viral strains. Specific hybridization patterns on 
an VLSIPS oligonucleotide substrate can identify the presence of particular viral genomes. 

Bacterial Identification 

[0316] Similar techniques will be applicable to identifying a bacterial source. This may be useful in diagnosing bac- 
terial infections, or in classifying sources of particular bacterial species. For example, the bacterial assay may be useful 
in determining the natural range of survivability of particular strains of bacteria across regions of the country or in 
different ecological niches. 

Other Microbiological Identifications 

[0317] The present invention provides means for diagnosis of other microbiological and other species, e.g., protozoal 
species and parasitic species in a biological sample, but also provides the means for assaying a combination of different 
infections. For example, a biological specimen may be assayed for the presence of any or all of these microbiological 
species. In human diagnostic uses, typical samples will be blood, sputum, stool, urine, or other samples. 

Individual Identification 

[0318] The present invention provides the ability to fingerprint and identify a genetic individual. This individual may 
be a bacterial or lower microorganism, as described above in diagnostic tests, or of a plant or animal. An individual 
may be identified genetically, as described. 

[031 9] Genetic fingerprinting has been utilized in comparing different related species in Southern hybridization blots. 
Genetic fingerprinting has also been used in forensic studies, see, e.g., Morris et al. (1989) J. Forensic Science 34: 
1311-1317, and references cited therein. As described above, an individual may be identified genetically by a sufficiently 
large number of probes. The likelihood that another individual would have an identical pattern over a sufficiently large 
number of probes may be statistically negligible. However, it is often quite important that a large number of probes be 
used where the statistical probability of matching is desired to be particularly low. In fact, the probes will optimally be 
selected for having high heterogeneity among the population. In addition, the fingerprint method may make use of the 
pattern of homologies indicated by a series of more and more stringent washes. Then, each position has both a se- 
quence specificity and a homology measurement, the combination of which greatly increases the number of dimensions 
and the statistical likelihood of a perfect pattern match with another genetic individual. 

Genetic Screening 

1 . test alleles with markers 

[0320] The present invention provides for the ability to screen for genetic variations of individuals. For example, a 
number of genetic diseases are linked with specific alleles. See, e.g., Scriber, C. et al. (eds.) (1989) The Metabolic 
Bases of Inherited Disease , McGraw-Hill, New York. In one embodiment, cystic fibrosis has been correlated with a 
specific gene, see, Gregory et al. (1990) Nature 347: 382-386. A number of alleles are correlated with specific genetic 
deficiencies. See, e.g., McKusick, V. (1 990) Genetic Inheritance in Man: Catalogs of Autosomal Dominant, Autosomal 
Recessive, and X-linked Phenotypes , Johns Hopkins University Press, Baltimore; Ott, J. (1985) Analysis of Human 
Genetic Linkage , Johns Hopkins University Press, Baltimore; Track, R. et al. (1989) Banbury Report 32: DNA Tech- 
nology and Forensic Science, Cold Spring Harbor Press, New York. 
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- 2. Amniocentesis 

[0321] Typically, amniocentesis is used to determine whether chromosome translocations have occurred. The map- 
ping procedure may provide the means for determining whether these translocations have occurred, and for detecting 
5 particular alleles of various markers. 

MAPPING 

Positionally Located Clones 

10 

[0322] The present invention allows for the positional location of specific clones useful for mapping: For example, 
caged biotin may be used for specifically positioning a probe to a location on a matrix pattern. 
[0323] In addition, the specific probes may be positionally directed to specific locations on a substrate by targeting. 
For example, polypeptide specific recognition reagents may be attached to oligonucleotide sequences which can be 

15 complementarily targeted, by hybridization, to specific locations on a VLSIPS substrate. Hybridization conditions, as 
applied for oligonucleotide probes, will be used to target the reagents to locations on a substrate having complementary 
oligonucleotides synthesized thereon. In another embodiment, oligonucleotide probes may be attached to specific 
polypeptide targeting reagents such as an antigen or antibody. These reagents can be directed towards a complemen- 
tary antigen or antibody already attached to a VLSIPS substrate. 

20 [0324] In another embodiment, an unnatural nucleotide which does not interfere with natural nucleotide complemen- 
tary hybridization may be used to target oligonucleotides to particular positions on a substrate. Unnatural optical isomers 
of natural nucleotides should be ideal candidates. 

[0325] In this way, short probes may be used to determine the mapping of long targets or long targets may be used 
to map the position of shorter probes. See, e.g., Craig et at. 1990 Nuc. Acids Res. 18: 2653-2660. 

25 

Positionally Defined Clones 

[0326] Positionally defined clones may be transferred to a new substrate by either physical transfer or by synthetic 
means. Synthetic means may involve either a production of the probe on the substrate using the VLSIPS synthetic 
30 methods, or may involve the attachment of a targeting sequence made by VLSIPS synthetic methods which will target 
that positionally defined clone to a position on a new substrate. Both methods will provide a substrate having a number 
of positionally defined probes useful in mapping. 

CONCLUSION 

35 

[0327] The present inventions provide greatly improved methods and apparatus for synthesis of polymers on sub- 
strates. It is to be understood that the above description is intended to be illustrative and not restrictive. Many embod- 
iments will be apparent to those of skill in the art upon reviewing the above description. By way of example, the invention 
has been described primarily with reference to the use of photoremovable protective groups, but it will be readily 

40 recognized by those of skill in the art that sources of radiation other than light could also be used. For example, in 
some embodiments it may be desirable to use protective groups which are sensitive to electron beam irradiation, x- 
ray irradiation, in combination with electron beam lithograph, or x-ray lithography techniques. Alternatively, the group 
could be removed by exposure to an electric current. The scope of the invention should, therefore, be determined not 
with reference to the above description, but should instead be determined with reference to the appended claims, along 

45 with the full scope of equivalents to which such claims are entitled. 

Claims 

50 1. A method for detecting nucleic acid sequences in two or more collections of nucleic acids, comprising: 

(a) providing an array comprising more than 100 different polynucleotide probes bound to a solid surface; 

(b) contacting said array of probes under hybridisation conditions with: 

55 

(i) a first collection of nucleic acids comprised of first-labelled nucleic acids having at (east some sequences 
complementary to probes of said array, and 
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(ii) at least a second collection of nucleic acids comprised of second-labelled nucleic acids having at least 
some sequences complementary to probes of said array, 

wherein said first and second labels are distinguishable from each other; and 

(c) detecting hybridisation of first and second labelled complementary nucleic acids to probes of said array 

A method as claimed in claim 1 , wherein said first and second labels are fluorescent labels that emit light of different 
wavelengths. 

A method as claimed in claim 2 used to fingerprint at least first and second ceils, wherein said first collection of 
nucleic acids is from a first cell and said second collection of nucleic acids is from a second cell, and fluorescence 
of said first and second labels hybridised to the array is detected, optionally said method further comprising: 

(a) determining levels of gene expression in said first and second cells, 

(b) determining patterns of gene expression in said first and second cells, or 

(c) determining genetic differences between said first and second cells. 
A method as claimed in claim 3, wherein said first and second cells are different types of cells, optionally wherein: 

(a) at least one cell type is a tumour cell or other cell exhibiting abnormal physiology, 

(b) said first and second cells are at different stages of development, 

(c) said first and second cells are at different stages of infection or other disease, or 

(d) said first and second cells are from different species of organism, optionally wherein said organism is an 
animal, plant or microorganism. 

A method as claimed in claim 3 or claim 4, wherein at least one collection of nucleic acids is synthesized by 
fluorescently labelling: 

(a) RNA isolated, generated or amplified from said cell; or 

(b) DNA isolated, generated or amplified from said cell. 

A method as claimed in any one of claims 1 to 5, wherein said solid surface is a polymeric substrate or includes 
fibers. 

. A method as claimed in any one of claims 1 to 6, wherein said probes are bound at a density of at least 10 3 - 
preferably at least 10 4 , more preferably at least 10 5 , even more preferably at least 10 6 regions per cm 2 to known 
regions on the solid surface. 

. A method as claimed in any one of claims 1 to 5, wherein said solid surface is formed as a collection of beads and 
each different polynucleotide probe is bound to a single bead. 

. A method as claimed in claim 8, wherein a bead further has an encoding system bound thereto such that the 
sequence of the polynucleotide bound to a bead can be determined by decoding the encoding system, optionally 
wherein said encoding system is selected from the group consisting of a magnetic system, shape encoding system, 
colour encoding system, or combination thereof. 

0. A method as claimed in claim 8 or claim 9, wherein an automated cell sorter is used to detect hybridisation. 

1. A method as claimed in any one of claims 1 to 10, wherein said array is comprised of more than 10 3 , preferably 
more than 10 4 , more preferably more than 10 5 , even more preferably more than 10 6 different probes bound to the 
solid surface. 
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12. A method as claimed in any preceding claim, wherein said probes are greater than about 15, preferably greater 
than about 25, more preferably greater than about 50 nucleotides in length. 

13. A method as claimed in any preceding claim, wherein at least said two collections of nucleic acids are hybridised 
to the same array of said probes. 

14. A method as claimed in claim 13, wherein at least said two collections of nucleic acids are hybridised separately 
or simultaneously to the same array of said probes. 

15. A method as claimed in any preceding claim, wherein said array has been recycled for use. 

16. A method as claimed in any preceding claim, wherein the sequences of the polynucleotide probes of the array are 
known. 



Patentanspruche 

1. Ein Verfahren zum Nachweis von Nucleinsauresequenzen in zwei Oder mehr Nucieinsauregruppen, umfassend: 

(a) Bereitsteilen einer Gruppierung, die mehr als 100 verschiedene Polynucleotidsonden aufweist, die an eine 
feste Oberflache gebunden sind; 

(b) In-Kontakt-Bringen der Gruppierung von Sonden unter Hybridisierungsbedingungen mit: 

(i) einer ersten Gruppe von Nucfeinsauren, die mit einem ersten Marker markierte Nucleinsauren umfasst, 
die mindestens einige Sequenzen aufweisen, die zu Sonden der Gruppierung komplementar sind, und 

(ii) mindestens einer zweiten Gruppe von Nucleinsauren, die mit einem zweiten Marker markierte Nuclein- 
sauren umfasst, die mindestens einige Sequenzen aufweisen, die zu Sonden der Gruppierung komple- 
mentar sind, 

wobei der erste und der zweite Marker voneinander unterscheidbar sind; und 

(c) Nachweisen der Hybridisierung der mit einem ersten Marker und einem zweiten Marker markierten kom- 
plementaren Nucleinsauren an Sonden der Gruppierung. 

2. Verfahren nach Anspruch 1 ( wobei der erste und der zweite Marker Fluoreszenzmarker sind, die Licht mit ver- 
schiedenen Wellenlangen emittieren. 

3. Verfahren nach Anspruch 2, das zum Anfertigen eines Fingerprints von mindestens ersten und zweiten Zellen 
verwendet wird, wobei die erste Gruppe von Nucleinsauren von einer ersten Zelle und die zweite Gruppe von 
Nucleinsauren von einer zweiten Zeile stammt, und wobei die Fluoreszenz des ersten und des zweiten Markers, 
die an die Gruppierung hybridisiert haben, nachgewiesen wird, wobei das Verfahren gegebenenfalls ferner um- 
fasst: 

(a) Bestimmen des Niveaus der Genexpression in den ersten und zweiten Zellen, 

(b) Bestimmen von Mustern der Genexpression in den ersten und zweiten Zellen, oder 

(c) Bestimmen von genetischen Unterschieden zwischen den ersten und zweiten Zellen. 

4. Verfahren nach Anspruch 3, wobei die ersten und zweiten Zellen verschiedene Zelltypen sind, wobei gegebenen- 
falls: 

(a) mindestens ein Zelltyp eine Tumorzelle oder eine andere Zelle mit abnormer Physiologie ist, 

(b) sich die ersten und zweiten Zellen in verschiedenen Entwicklungsstadien befinden, 
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(c) sich die ersten und zweiten Zellen in verschiedenen Stadien einer Infektion odereiner anderen Erkrankung 
befinden, Oder 

(d) die ersten und zweiten Zellen von verschiedenen Organismusarten stammen, 

wobei der Organismus gegebenenfalls ein Tier, eine Pflanze oder ein Mikroorganismus ist. 

5. Verfahren nach Anspruch 3 oder 4, wobei mindestens eine Gruppe von Nucieinsauren durch Fiuoreszenzmarkie- 
ren von : 

(a) RNA, die von der Zelle isoliert, erzeugt oder amplifiziert worden ist; oder 

(b) DNA, die von der Zelle isoliert, erzeugt oder amplifiziert worden ist, synthetisiert wird. 

6. Verfahren nach einem der Anspruche 1 bis 5, wobei die feste Oberflache ein polymerer Trager ist oder Fasern 
umfasst. 

7. Verfahren nach einem der Anspruche 1 bis 6, wobei die Sonden mit einer Dichte von mindestens 10 3 , vorzugsweise 
mindestens 10 4 , mehr bevorzugt mindestens iO 5 und insbesondere mindestens 10 6 Bereichen pro cm 2 an be- 
kannte Bereiche der festen Oberflache gebunden sind. 

8. Verfahren nach einem der Anspruche 1 bis 5, wobei die feste Oberflache als Gruppe von Kugelchen ausgebildet 
ist und jede unterschiedliche Polynucleotidsonde an ein einzelnes Kugelchen gebunden ist. 

9. Verfahren nach Anspruch 8, wobei ein Kugelchen ferner ein daran gebundenes Codierungssystem aufweist, derart, 
dass die Sequenz des an ein Kugelchen gebundenen Polynucleotids durch Decodieren des Codierungssystems 
bestimmt werden kann, wobei das Codierungssystem gegebenenfalls aus der Gruppe bestehend aus einem ma- 
gnetischen System, Gestaltcodierungssystem, Farbcodierungssystem oder einer Kombination davon ausgewahlt 

10. Verfahren nach Anspruch 8 oder 9, wobei zum Nachweis der Hybridisierung eine automatisierte Zellsortierungs- 
einrichtung verwendet wird. 

11. Verfahren nach einem der Anspruche 1 bis 10, wobei die Gruppierung mehr als 10 3 , vorzugsweise mehr als 10 4 , 
mehr bevorzugt mehr als 105 und insbesondere mehr als 10 6 verschiedene, an die feste Oberflache gebundene 
Sonden umfasst. 

12. Verfahren nach einem der vorstehenden Anspruche, wobei die Sonden eine Lange von mehr als etwa 15, vor- 
zugsweise mehr als etwa 25 und insbesondere mehr als etwa 50 Nucleotiden aufweisen. 

13. Verfahren nach einem der vorstehenden Anspruche, wobei mindestens die beiden Gruppen von Nucieinsauren 
an die gleiche Gruppierung der Sonden hybridisiert werden. 

14. Verfahren nach Anspruch 13, wobei mindestens die beiden Gruppen von Nucieinsauren getrennt oder gleichzeitig 
an die gleiche Gruppierung von Sonden hybridisiert werden. 

1 5. Verfahren nach einem der vorstehenden Anspruche, wobei die Gruppierung zur Verwendung rezykliert worden ist. 

16. Verfahren nach einem der vorstehenden Anspruche, wobei die Sequenzen der Polynucleotidsonden der Gruppie- 
rung bekannt sind. 



Revendications 

1. Methode pour detecter les sequences nucleiques dans deux ou plusieurs collections d'acides nucleiques, 
comprenant : 

(a) la mise a disposition d'un reseau comprenant plus de 100 differentes sondes polynucleotidiques liees a 
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une surface solide ; - - 

(b) la mise en contact dudit reseau de sondes dans des conditions d'hybridation avec: 

(i) une premiere collection d'acides nucleiques composee d'acides nucleiques fonctionnalises par un pre- 
mier marqueur possedant au moins quelques sequences complementaires aux sondes dudit reseau, et 

(ii) au moins une seconde collection d'acides nucleiques composee d'acides nucleiques fonctionnalises 
a I'aide d'un second marqueur possedant au moins quelques sequences complementaires aux sondes 
dudit reseau, 

avec lesdits premier et second marqueurs etant distincts Tun de I'autre; et 

(c) la detection d'hybridation des acides nucleiques complementaires fonctionnalises par les premier et second 
marqueurs aux sondes dudit reseau. 

Methode selon la revendication 1 , caracterisee en ce que les premier et second marqueurs sont des marqueurs 
fluorescents qui emettent de la lumiere a des longueurs d'onde differentes. 

Methode telle que revendiquee en revendication 2, utilisee pour marquer au moins des cellules dites premiere et 
seconde, dans laquelle ladite premiere collection d'acides nucleiques est issue d'une cellule premiere et ladite 
seconde collection d'acides nucleiques est issue d'une cellule seconde, et la fluorescence desdits premier et se- 
cond marqueurs hybrides au reseau est detectee, ladite methode comprenant en outre le cas echeant: 

(a) la determination du niveau d'expression de gene(s) dans lesdites cellules premiere et seconde, 

(b) la determination des profils de gene(s) exprime(s) dans lesdites cellules premiere et seconde, ou 

(c) la determination des differences genetiques entre lesdites cellules premiere et seconde. 

Methode telle que revendiquee en revendication 3, dans laquelle les cellules premiere et seconde sont des cellules 
de types' differents, le cas echeant caracterisee en ce que: 

(a) au moins un type de cellule est une cellule tumorale ou une autre cellule manifestant une physiologie 
anormale, 

(b) lesdites cellules premiere et seconde sont a des stades differents de developpement, 

(c) lesdites cellules premiere et seconde sont a des stades differents defection ou autre pathologte, ou 

(d) lesdites cellules premiere et seconde sont issues d'especes differentes d'organisme, le cas echeant ca- 
racterisee en ce que I'organisme est un animal, une plante ou un microorganisme. 

Methode telle que revendiquee en revendication 3 ou 4, caracterisee en ce qu'au moins une collection d'acides 
nucleiques est synthetisee par marquage de maniere fluorescente: 

(a) d'ARN isole, genere ou amplifie a partir de ladite cellule ; ou 

(b) d'ADN isole, genere ou amplifie a partir de ladite cellule. 

Methode telle que revendiquee dans Tune quelconque des revendications 1 a 5, caracterisee en ce que la surface 
solide est un substrat polymerique ou incorpore des fibres. 

Methode telle que revendiquee dans Tune quelconque des revendications 1 a 6, caracterisee en ce que lesdites 
sondes sont liees a une densite d'au moins 10 3 , de preference d'au moins 10 4 , plus preferentiellement d'au moins 
10 5 , et encore plus preferentiellement d'au moins 10 6 regions par cm 2 a des regions determinees de la surface 
solide. 

Methode telle que revendiquee dans Tune quelconque des revendications 1 a 5, caracterisee en ce que ladite 
surface solide presente une pluralite de reliefs et en ce que chaque sonde polynucleotidique differente est liee a 
un unique relief. 

Methode telle que revendiquee en revendication 8, caracterisee en ce qu'un relief possede en outre un systeme 
de codage lie a sa surface de telle maniere que la sequence du polynucleotide lie au relief peut etre determinee 
en decodant le systeme de codage, et en ce que, le cas echeant, le systeme de codage est choisi parmi le groupe 
consistant en un systeme magnetique, en un systeme codant une forme, un systeme codant une couleur, ou Tune 
de leur combinaison. 
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10. Methode-telle que revendiquee en revendication 8 ou 9, caracterisee en ce qu'un trieur de cellule automatique 
est utilise pour detecter I'hybridation. 

11. Methode telle que revendiquee dans I'une quelconque des revendications 1 a -10, caracterisee en ce que ledit 
reseau comprend plus de 10 3 , de preference plus de 10 4 , plus preferentiellement plus de 10 5 , et encore plus 
preferentiellement plus de 10 6 sondes differentes liees a la surface solide. 

12. Methode telle que revendiquee dans Tune quelconque des revendications precedentes, caracterisee en ce que 
lesdites sondes possedent en longueur plus de 15, de preference environ plus de 25, et plus preferentiellement 
environ plus de 50 nucleotides. 

13. Methode telle que revendiquee dans I'une quelconque des revendications precedentes, caracterisee en ce qu'au 
moins deux collections d'acides nucleiques sont hybridees au meme reseau desdites sondes. 

14. Methode telle que revendiquee en revendication 13, caracterisee en ce qu'au moins deux collections d'acides 
nucleiques sont hybridees separement ou simultanement au meme niveau desdites sondes. 

1 5. Methode telle que revendiquee dans I'une quelconque des revendications precedentes, caracterisee en ce que 
ledit reseau a ete recycle pour ('utilisation. 

16. Methode telle que revendiquee dans Tune quelconque des revendications precedentes, caracterisee en ce que 
les sequences des sondes polynucleotidiques du reseau sont connues. 
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